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Preface 


Scientists, engineers and the like are a strange lot. Unperturbed by societal norms, 
they direct their energies to finding better alternatives to existing theories and 
concocting solutions to unsolved problems. Driven by an insatiable curiosity, they 
record their observations and crunch the numbers. This tome is about the science 
of crunching. It’s about digging out something of value from the detritus that 
others tend to leave behind. The described approaches involve constructing 
models to process the available data. Smoothing entails revisiting historical 
records in an endeavour to understand something of the past. Filtering refers to 
estimating what is happening currently, whereas prediction is concerned with 
hazarding a guess about what might happen next. 


The basics of smoothing, filtering and prediction were worked out by Norbert 
Wiener, Rudolf E. Kalman and Richard S. Bucy et al over half a century ago. This 
book describes the classical techniques together with some more recently 
developed embellishments for improving performance within applications. Its 
aims are threefold. First, to present the subject in an accessible way, so that it can 
serve as a practical guide for undergraduates and newcomers to the field. Second, 
to differentiate between techniques that satisfy performance criteria versus those 
relying on heuristics. Third, to draw attention to Wiener’s approach for optimal 
non-causal filtering (or smoothing). 


Optimal estimation is routinely taught at a post-graduate level while not 
necessarily assuming familiarity with prerequisite material or backgrounds in an 
engineering discipline. That is, the basics of estimation theory can be taught as a 
standalone subject. In the same way that a vehicle driver does not need to 
understand the workings of an internal combustion engine or a computer user does 
not need to be acquainted with its inner workings, implementing an optimal filter 
is hardly rocket science. Indeed, since the filter recursions are all known — its 
operation is no different to pushing a button on a calculator. The key to obtaining 
good estimator performance is developing intimacy with the application at hand, 
namely, exploiting any available insight, expertise and a priori knowledge to 
model the problem. If the measurement noise is negligible, any number of 
solutions may suffice. Conversely, if the observations are dominated by 
measurement noise, the problem may be too hard. Experienced practitioners are 
able recognise those intermediate sweet-spots where cost-benefits can be realised. 


Systems employing optimal techniques pervade our lives. They are embedded 
within medical diagnosis equipment, communication networks, aircraft avionics, 
robotics and market forecasting — to name a few. When tasked with new problems, 
in which information is to be extracted from noisy measurements, one can be 
faced with a plethora of algorithms and techniques. Understanding the 
performance of candidate approaches may seem unwieldy and daunting to 
novices. Therefore, the philosophy here is to present the linear-quadratic-Gaussian 
results for smoothing, filtering and prediction with accompanying proofs about 
performance being attained, wherever this is appropriate. Unfortunately, this does 
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require some maths which trades off accessibility. The treatment is little repetitive 
and may seem trite, but hopefully it contributes an understanding of the conditions 
under which solutions can value-add. 


Science is an evolving process where what we think we know is continuously 
updated with refashioned ideas. Although evidence suggests that Babylonian 
astronomers were able to predict planetary motion, a bewildering variety of Earth 
and universe models followed. According to lore, ancient Greek philosophers such 
as Aristotle assumed a geocentric model of the universe and about two centuries 
later Aristarchus developed a heliocentric version. It is reported that Eratosthenes 
arrived at a good estimate of the Earth’s circumference, yet there was a revival of 
flat earth beliefs during the middle ages. Not all ideas are welcomed - Galileo was 
famously incarcerated for knowing too much. Similarly, newly-appearing signal 
processing techniques compete with old favourites. An aspiration here is to 
publicise that the oft forgotten approach of Wiener, which in concert with 
Kalman’s, leads to optimal smoothers. The ensuing results contrast with 
traditional solutions and may not sit well with more orthodox practitioners. 


Kalman’s optimal filter results were published in the early 1960s and various 
techniques for smoothing in a state-space framework were developed shortly 
thereafter. Wiener’s optimal smoother solution is less well known, perhaps 
because it was framed in the frequency domain and described in the archaic 
language of the day. His work of the 1940s was borne of an analog world where 
filters were made exclusively of lumped circuit components. At that time, 
computers referred to people labouring with an abacus or an adding machine — 
Alan Turing’s and John von Neumann’s ideas had yet to be realised. In his book, 
Extrapolation, Interpolation and Smoothing of Stationary Time Series, Wiener 
wrote with little fanfare and dubbed the smoother “unrealisable”. The use of the 
Wiener-Hopf factor allows this smoother to be expressed in a time-domain state- 
space setting and included alongside other techniques within the designer’s 
toolbox. 


A model-based approach is employed throughout where estimation problems are 
defined in terms of state-space parameters. I recall attending Michael Green’s 
robust control course, where he referred to a distillation column control problem 
competition, in which a student’s robust low-order solution out-performed a senior 
specialist’s optimal high-order solution. It is hoped that this text will equip readers 
to do similarly, namely: make some simplifying assumptions, apply the standard 
solutions and back-off from optimality if uncertainties degrade performance. 


Both continuous-time and discrete-time techniques are presented. Sometimes the 
state dynamics and observations may be modelled exactly in continuous-time. In 
the majority of applications, some discrete-time approximations and processing of 
sampled data will be required. The material is organised as an eleven-lecture 
course. 


e Chapter | introduces some standard continuous-time fare such as the 
Laplace Transform, stability, adjoints and causality. A completing-the- 
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square approach is then used to obtain the minimum-mean-square error 
(or Wiener) filtering solutions. 


e Chapter 2 deals with discrete-time minimum-mean-square error filtering. 
The treatment is somewhat brief since the developments follow 
analogously from the continuous-time case. 


e Chapter 3 describes continuous-time minimum-variance (or Kalman- 
Bucy) filtering. The filter is found using the conditional mean or least- 
mean-square-error formula. It is shown for time-invariant problems that 
the Wiener and Kalman solutions are the same. 


e Chapter 4 addresses discrete-time minimum-variance (or Kalman) 
prediction and filtering. Once again, the optimum conditional mean 
estimate may be found via the least-mean-square-error approach. 
Generalisations for missing data, deterministic inputs, correlated noises, 
direct feedthrough terms, output estimation and equalisation are 
described. 


e Chapter 5 simplifies the discrete-time minimum-variance filtering results 
for steady-state problems. Discrete-time observability, Riccati equation 
solution convergence, asymptotic stability and Wiener filter equivalence 
are discussed. 


e Chapter 6 covers the subject of continuous-time smoothing. The main 
fixed-lag, fixed-point and fixed-interval smoother results are derived. It is 
shown that the minimum-variance fixed-interval smoother attains the best 
performance. 


e Chapter 7 is about discrete-time smoothing. It is observed that the fixed- 
point fixed-lag, fixed-interval smoothers outperform the Kalman filter. 
Once again, the minimum-variance smoother attains the best-possible 
performance, provided that the underlying assumptions are correct. 


e Chapter 8 attends to parameter estimation. As the above-mentioned 
approaches all rely on knowledge of the underlying model parameters, 
maximum-likelihood estimation techniques and _ expectation- 
maximisation algorithms are described. The addition of a correction term 
yields unbiased, consistent state-space parameter estimates that attain the 
Cramer-Rao Lower Bounds. 


e Chapter 9 is concerned with robust techniques that accommodate 
uncertainties within problem specifications. An extra term within the 
design Riccati equations enables designers to trade-off average error and 
peak error performance. 


e Chapter 10 applies the afore-mentioned linear techniques to nonlinear 
estimation problems. It is demonstrated that step-wise linearisations can 
be used within predictors, filters and smoothers, albeit by forsaking 
optimal performance guarantees. 
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e Chapter 11 rounds off the course by exploiting knowledge about 
transition probabilities. HMM, minimum-variance-HMM and high-order 
minimum-variance-HMM _ filters and smoothers are derived. The 
improved performance offered by these techniques needs to be reconciled 
against the significantly higher calculation overheads. 


The foundations are laid in Chapters 1 — 2, which explain minimum-mean-square- 
error solution construction and asymptotic behaviour. In single-input-single-output 
cases, finding Wiener filter transfer functions may have appeal. In general, 
designing Kalman filters is more tractable because solving a Riccati equation is 
easier than pole-zero cancellation. Kalman filters are needed if the signal models 
are time-varying. The filtered states can be updated via a one-line recursion but 
the gain may require to be re-evaluated at each step in time. Extended Kalman 
filters are contenders if nonlinearities are present. Smoothers are advocated when 
better performance is desired and some calculation delays can be tolerated. Using 
additional transition probability information can be advantageous wherever 
measurements exhibit reoccurring patterns. 


This book elaborates on several articles published in JEEE journals and I am 
grateful to my collaborators and the anonymous reviewers who have improved my 
efforts over the years. Lang White continues to teach and motivate me to this day. 
The great people at the CSIRO, such as David Hainsworth and George Poropat 
generously make themselves available to anglicise my engineering jargon. 
Sometimes posing good questions is helpful, for example, Paul Malcolm once 
asked “is it stable?” which led down to fruitful paths. During a seminar at HSU, 
Udo Zoelzer provided the impulse for me to undertake this project. My sources of 
inspiration include interactions at the CDC meetings - thanks particularly to 
Dennis Bernstein whose passion for writing has motivated me along the way. 


Garry Einicke, 
CSIRO Australia 
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1. Continuous-Time, Minimum-Mean- 
Square-Error Filtering 


1.1. Introduction 


Optimal filtering is concerned with designing the best linear system for recovering 
data from noisy measurements. It is a model-based approach requiring knowledge 
of the signal generating system. The signal models, together with the noise 
statistics are factored into the design in such a way to satisfy an optimality 
criterion, namely, minimising the square of the error. 


A prerequisite technique, the method of least-squares, has its origin in curve 
fitting. Amid some controversy, Kepler claimed in 1609 that the planets move 
around the Sun in elliptical orbits [1]. Carl Freidrich Gauss arrived at a better 
performing method for fitting curves to astronomical observations and predicting 
planetary trajectories in 1799 [1]. He formally published a least-squares 
approximation method in 1809 [2], which was developed independently by 
Adrien-Marie Legendre in 1806 [1]. This technique was famously used by 
Giusseppe Piazzi to discover and track the asteroid Ceres using a least-squares 
analysis which was easier than solving Kepler’s complicated nonlinear equations 
of planetary motion [1]. Andrey N. Kolmogorov refined Gauss’s theory of least- 
squares and applied it for the prediction of discrete-time stationary stochastic 
processes in 1939 [3]. Norbert Wiener, a faculty member at MIT, independently 
solved analogous continuous-time estimation problems. He worked on defence 
applications during the Second World War and produced a report entitled 
Extrapolation, Interpolation and Smoothing of Stationary Time Series in 1943. 
The report was later published as a book in 1949 [4]. 


Wiener derived two important results, namely, the optimum (non-causal) 
minimum-mean-square-error solution and the optimum causal minimum-mean- 
square-error solution [4] — [6]. The optimum causal solution has since become 
known at the Wiener filter and in the time-invariant case is equivalent to the 
Kalman filter that was developed subsequently. Wiener pursued practical 
outcomes and attributed the term “unrealisable filter” to the optimal non-causal 
solution because “it is not in fact realisable with a finite network of resistances, 
capacities, and inductances” [4]. Wiener’s unrealisable filter is actually the 
optimum linear smoother. 


“All men by nature desire to know.” Aristotle 
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The optimal Wiener filter is calculated in the frequency domain. Consequently, 
Section 1.2 touches on some frequency-domain concepts. In particular, the notions 
of spaces, state-space systems, transfer functions, canonical realisations, stability, 
causal systems, power spectral density and spectral factorisation are introduced. 
The Wiener filter is then derived by minimising the square of the error. Three 
cases are discussed in Section 1.3. First, the solution to general estimation 
problem is stated. Second, the general estimation results are specialised to output 
estimation. The optimal input estimation or equalisation solution is then described. 
An example, demonstrating the recovery of a desired signal from noisy 
measurements, completes the chapter. 


1.2 Prerequisites 


1.2.1 Signals 


Let (.)7 denote the transpose operator. Consider a continuous-time, stochastic (or 
random) signal w(t) = [w,(0), w,(d), ..., w,(@O]', with w() ¢ R,i=1,... 1, 


which belongs to the space R” , or more concisely w(t) € R” . In general, where n 
> 1, the set of w(4) over all time ¢ is denoted by the matrix w= [ w(-09), ..., w(co)]. 
Often the focus is on scalar signals, in which case w = [ w(-00), ..., w(oo)] is a row 
vector. 


1.2.2 Elementary Functions Defined on Signals 


The inner product (v,w) of two continuous-time signal vectors v and w is defined 
by 


(vy, w) = [i v'w dt. (1) 


The 2-norm or Euclidean norm of a continuous-time signal vector w, is defined as 


Im], = (ww) = {fw wat . The square of the 2-norm, that is, of, = 


(ww) = ie w' w dt is commonly known as energy of the signal w. 


“Scientific discovery consists in the interpretation for our own convenience of a system of existence 


which has been made with no eye to our convenience at all.” Norbert Wiener 
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1.2.3 Spaces 


The Lebesgue 2-space, defined as the set of continuous-time signals having finite 
2-norm, is denoted by £2. Thus, w € £2 means that the energy of w is bounded. 
The following properties hold for 2-norms. 

(i) Jv], <0 v=0. 

Gi) Jer, =e] IPL, - 


Git) [[v+ 4], < Pel, + 


, > which is known as the triangle inequality. 
Civ) ml, <I, Dl, - 


(¥) |(v.1)] sf , he 


See [8] for more detailed discussions of spaces and norms. 


,» which is known as the Cauchy-Schwarz inequality. 


1.2.4 Linear Systems 


A linear system is defined as having an output vector which is equal to the value 
of a linear operator applied to an input vector. That is, the relationships between 
the output and input vectors are described by linear equations, which may be 
algebraic, differential or integral. Linear time-domain systems are denoted by 
upper-case script fonts. Consider two linear systems G:w > Gw, H:w > 
Hw , that is, they operate on an input w and produce outputs Gw, Aw. The 
following properties hold. 


(G+H)w = Gwt+Hw, (2) 
(GH )w =G (Hw), (3) 
(aG )w= a(Gw), (4) 


where a@ ¢€ R. An interpretation of (2) is that a parallel combination of G and 
#H is equivalent to the system G + A. From (3), a series combination of G 
and #A is equivalent to the system GA. Equation (4) states that scalar 
amplification of a system is equivalent to scalar amplification of a system’s 
output. 


1.2.5 Polynomial Fraction Systems 


The Wiener filtering results [4] — [6] were originally developed for polynomial 
fraction descriptions of systems which are described below. Consider an n'*-order 
linear, time-invariant system GY that operates on an input w(t) e R and produces 
an output y(t) € R. Suppose that the differential equation model for this system is 


“Science is a way of thinking much more than it is a body of knowledge.” Carl Edward Sagan 
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gp EID) a" 'y) , dy(t) 


n dt” 7 ay dt": et a, dt + a, W(t) 
d” w(t) a” w(t) dw(t) 
= b +b +...+b, ——+bwi(t), 
m dt” m-1 dt" 1 dt 1 W( ) (5) 


where do, ... dn and bo, ... bm are real-valued constant coefficients, a, #0, with 


zero initial conditions. This differential equation can be written in the more 
compact form 


qd" n-1 d 
a, an ape es aoe y(t) 


da” qd”! d 
= |b +b +..4+b —+b, t). 6 
m dt” m-1 dt’ 1 dt so ) ( ) 


1.2.6 The Laplace Transform of a Signal 


The two-sided Laplace transform of a continuous-time signal y(t) € R is denoted 
by Y(s) and defined by 


Y(s)= ibe y(ne “dt , (7) 


where s = o + jw is the Laplace transform variable, in which o, w € R andj = 


J-1. Given a signal y(t) with Laplace transform Y(s), y(t) can be calculated from 
Y(s) by taking the inverse Laplace Transform of Y(s), which is defined by 


y(t) = [EY oe"as , (8) 


Theorem 1 Parseval’s Theorem [7]: 


[-jrOP at = [7° v)Pas. (9) 


Proof, Let y"(t) =["" Y"(sje“ds and Y"(s) denote the Hermitian transpose 
o-jn 

(or adjoint) of y(t) and Y(s), respectively. The left-hand-side of (9) may be written 

as 


“No, no, you're not thinking; you're just being logical.” Niels Henrik David Bohr 
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[ofa =f" y"oyoat 


joo 1 oa) 

=|" Jv" (s)e“ds y(t 
eo ota 
oO 1 joo 

=(° —[" edt Y" (s)ds 
00 2a] —jo 


= ” vs)¥" (s)ds 
a 
= in |\Y(s)| ds : 
es 
The above theorem is attributed to Parseval whose original work [7] concerned the 


sums of trigonometric series. An interpretation of (9) is that the energy in the time 
domain equals the energy in the frequency domain. 


1.2.7 Polynomial Fraction Transfer Functions 


The steady-state response y(t) = Y(s)e“ can be found by applying the complex- 
exponential input w(t) = W(s)e" to the terms of (6), which results in 


(a,s" +4,_8" | +..4+4,8 + dy )Y(s)e" = (Bas +b, so" +..4b5 +b, )W(s)e" : 


(10) 
Therefore, 
bs" tbh os" +..4b54+b 
Y(s)= m - m-1 — 1 0 W(s) 
a,S" +a, 8" +..+48 +d, 
=G(s)W(s), (11) 
where 
G(s) = bs" +b, se" +..+b5 +b, . (12) 


n n-1 
a,S' +4, ,8° +..+ 48+ Ay 


is known as the transfer function of the system. It can be seen from (6) and (12) 
that the polynomial transfer function coefficients correspond to the system’s 
differential equation coefficients. Thus, knowledge of a system’s differential 
equation is sufficient to identify its transfer function. 


“Nature laughs at the difficulties of integration.” Pierre-Simon Laplace 
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1.2.8 Poles and Zeros 


The numerator and denominator polynomials of (12) can be factored into m and n 
linear factors, respectively, to give 


b,, (s - Bs — B,).-(- By) (13) 


a,(s—a,)(s—a,)..(s—a@,) 


G(s) = 


The numerator of G(s) is zero when s = f;, i = 1 ... m. These values of s are called 
the zeros of G(s). Zeros in the left-hand-plane are called minimum-phase whereas 
zeros in the right-hand-plane are called non-minimum phase. The denominator of 
G(s) is zero when s = 0, i= 1 ... n. These values of s are called the poles of G(s). 


Example 1. Consider a system described by the differential equation y(t) =— y(t) 
+ w(A), in which y(f) is the output arising from the input w(f). From (6) and (12), it 
follows that the corresponding transfer function is given by G(s) = (s + 1)!, which 
possesses a pole at s = - J. 

The system in Example | operates on a single input and produces a single output, 
which is known as single-input-single-output (SISO) system. Systems operating 
on multiple inputs and producing multiple outputs are known as multiple-input- 
multiple-output (MIMO). The corresponding transfer function matrices can be 
written as equation (14) below, where the components G;{s) have the polynomial 
transfer function form within (12) or (13). 


G(s) G(s) . G,(s) 


G,,(s) G(s) 


G(s) = (14) 


G,,(s) . G(s) 
1.2.9 State-Space Systems 


Consider a system G that operates on an input signal w = {w(t) € R”, t € [0, 
T]} and produces an output signal y = { y(t) e R‘,¢t € [0, TI}, ie, y= Gw. 
Suppose that the system @ has a state-space realisation of the form 


X(t) = Ax(t) + Bw(t) (15) 
y(t) = Cx(t)+ Dw(t) , (16) 


“Tt is important that students bring a certain ragamuffin, barefoot irreverence to their studies, they are 
not here to worship what is known but to question it.” Jacob Bronowski 
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where 4 € RR”, Be R””,Ce R™ andDe R”, in which x(t) € R” isa 
state vector. A = {a;;} and D = {dj} are respectively known as a state matrix and a 
direct feed-through matrix. The matrices B = {b;;} and C = {cj,;} are known as 
input and output mappings, respectively. For ease of understanding, the state- 
space model (15) — (16) may be written in the expanded form 


* 


()= 


(t)= 
Yq 


a) 


This system is depicted in Fig. 1. 


w(t) 


1.2.10 Euler’s Method for Numerical Integration 


by pa Diy 
(t)+ : 

bis nym 

di, lm 
(f)+| : 

dy, a 


Fig. 1. Continuous-time state-space system. 


Differential equations of the form (15) could be implemented directly by analog 
circuits. Digital or software implementations require a method for numerical 
integration. A first-order numerical integration technique, known as Euler’s 
method, is now described. Suppose that x(¢) is infinitely differentiable and 


consider its Taylor series expansion in the neighbourhood of fo 


X(t) = x(ty)+ 


dt? 3! 
(¢ = ly , 


= ati) + 9) (4) HOG, 


3! 


(t—t)) dx(ty) 4 (t-t))° d°x(ty) 4 (t-t))° d°x(t,) ater 
2! 


#(q) + 


“T have not failed. I’ve just found 10,000 ways that won’t work.” Thomas Alva Edison 


(17) 
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Truncating the series after the first order term yields the approximation x(t) = x(to) 
+ (¢-t,)x(t,) . Defining t, = t-1 + 6; leads to 


x(t,) = x(t) + 6,X(6,) 
x(t,) = x(t,) + 6,x(¢,) 


X(tegs) = XU, +5,(6,). (18) 


Thus, the continuous-time linear system (15) could be approximated in discrete- 
time by iterating 


¥(t,,,) = Ax(t,) + Bw(t,) (19) 
and (18) provided that 6; is chosen to be suitably small. Applications of (18) — (19) 


appear in [9] and in the following example. 


Example 2. In respect of the continuous-time state evolution (15), consider A = 
—1, B = 1 together with the deterministic input w(t) = sin(¢) + cos(A). The states 
can be calculated from the known w(?) using (19) and the difference equation (18). 
In this case, the state error is given by e(t) = sin(t) — x(t). In particular, root- 
mean-square-errors of 0.34, 0.031, 0.0025 and 0.00024, were observed for 6, = 1, 
0.1, 0.01 and 0.001, respectively. This demonstrates that the first order 
approximation (18) can be reasonable when 46; is sufficiently small. 


1.2.11 State-Space Transfer Function Matrix 

The transfer function matrix of the state-space system (15) - (16) is defined by 
G(s)=C(sI —A)'B+D, (20) 

in which s again denotes the Laplace transform variable. 


Example 3. For a state-space model with 4 = —1, B = C = 1 and D = 0, the 
transfer function is G(s) = (s+ 1)1. 


“Science is everything we understand well enough to explain to a computer. Art is everything else.” 
David Ervin Knuth 
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—3 -2 1 
Example 4. For state-space parameters A -| i | B= ie C= [2 5] 


0 
2 fee 1 [d -b 
and D = 0, the use of Cramer’s rule, that is, = : 
c ad ad-bc|-c a 
(2s+5) — 1 1 


yields the transfer function G(s) = + ‘ 
(s+1)(s+2) (s+l) (s+2) 


Example 5. Substituting A= ' 


0 1 0} . 
and B=C=D= into (20) 
—2 0 1 


S+2 
s+] 


results in the transfer function matrix G(s) = 4 
s+ 


s+2 


1.2.12 Canonical Realisations 


The mapping of a polynomial fraction transfer function (12) to a state-space 
representation (20) is not unique. Two standard state-space realisations of 
polynomial fraction transfer functions are described below. Assume that: the 
transfer function has been expanded into the sum of a direct feed-though term plus 
a strictly proper transfer function, in which the order of the numerator polynomial 
is less than the order of the denominator polynomial; and the strictly proper 
transfer function has been normalised so that a, = 1. Under these assumptions, the 
system can be realised in the controllable canonical form which is parameterised 
by [10] 


4,1 Ayo “aq a 
1 0 0 0 
A=|.0 1 »B=|\> | and C=|[b, bo «dp dy]. 
: 0 0 0 
0 0 cht. ol. 0 0 


The system can be also realised in the observable canonical form which is 
parameterised by 


“Science might almost be redefined as the process of substituting unimportant questions which can be 
answered for important questions which cannot.” Kenneth Ewart Boulding 


10 Chapter 1 Continuous-Time, Minimum-Square-Error Filtering 


-a,, 1 0 .. 0 b,, 
-a,, 0 1 0 bi 
A Oi B= | and C= [1 0: 3c OOF. 
—a, 0 1 b, 
| —a 0 0 0 Dy 


1.2.13 Asymptotic Stability 


Consider a continuous-time, linear, time-invariant n'*-order system @ that 
operates on an input w and produces an output y. The system @ is said to be 
asymptotically stable if the output remains bounded, that is, y € £2, for any w € 


£>. This is also known as bounded-input-bounded-outputstability. Two equivalent 
conditions for G to be asymptotically stable are: 
(i) The real part of the eigenvalues of the system’s state matrix are in the 
left-hand-plane, that is, for A of (20), Re{A,(A)}<0,7=1.... 
(ii) The real part of the poles of the system’s transfer function are in the left- 
hand-plane, that is, for a; of (13), Re{a,} <0,i=1...n. 


Example 6. A state-space system having 4 = — 1, B= C= 1 and D = 0 is stable, 
since A(A) = — | is in the left-hand-plane. Equivalently, the corresponding transfer 
function G(s) = (s + 1)! has a pole at s =— 1 which is in the left-hand-plane and 
so the system is stable. Conversely, the transfer function G7(-s) = (1 — s)! is 
unstable because it has a singularity at the pole s = 1 which is in the right hand 
side of the complex plane. G7(-s) is known as the adjoint of G(s) which is 
discussed below. 


1.2.14 Adjoint Systems 


An important concept in the ensuing development of filters and smoothers is the 
adjoint of a system. Consider again a system @ that operates on an input signal w 


and produces an output signal Gw. Then G”, the adjoint of 7, is the unique 
linear system which produces an output signal Gu, such that <u, G w> = 
<G"u, w>, for all real-valued u and w of compatible dimensions. The following 
derivation is a simplification of the time-varying version that appears in [11]. 


Lemma 1 (State-space representation of an adjoint system): Suppose that a 
continuous-time linear time-invariant system G is described by 


“If you thought that science was certain—well, that is just an error on your part.” Richard Phillips 
Feynman 
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x(t) = Ax(t) + Bw(t), 
y(t) = Cx(t) + Dw(t) , 


with x(to) = 0. The adjoint G" is the linear system having the realisation 


6) =-AS()-Cu(n), 
z(t) = B'C(t)+D'u(t), 


with C(T) = 0. 
Proof: The system (21) — (22) can be written equivalently 
d 
ff oR ie al 
dt = 
cp ol bo 
with x(to) = 0. Thus 
d 
A= At 2B 
<y, Gw> = ce dt i 
u C p |’ 
T dx T r 
=[, (s" &) dt—| 6" (Ax+ Bw) dt+[ uw" (Cx+ Dw) dt. 


Integrating the last term by parts gives 


<y, Gw>= cranny {2 s dt—[" 6" (Ax-+ Bw) dt. 
+[/u" (Cx+ Dw) dt 
-(£1-#| cr fe a : 
=(| (at +A? (T)x(T) 
_Rr D" u Ww 


=< G"y,w>, 


where G" is given by (23) — (24). 


“We haven't the money, so we've got to think.” Baron Ernest Rutherford 


(25) 


(26) 


(27) 
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A 
Thus, the adjoint of a continuous-time system having the parameters Be | isa 


T T 


-A 
system with 


B Dp’ 


| Adjoint systems have the property (7")"” = @. 


The adjoint of the transfer function matrix G(s) is denoted as G%(s) and is defined 
by the transfer function matrix 


G(s) = G"-s). (28) 


Example 7. Suppose that a system @ has state-space parameters 4 = —1 and B = 
C = D= 1. From (23) — (24), an adjoint system has the state-space parameters A = 
1,B =D=1 and C=~1 and the corresponding transfer function is G4(s) = 1 —(s 

It =(-54+2)-54+1)! =(s-2)(s-1)', which is unstable and non-minimum- 
phase. Alternatively, the adjoint of G(s) =1+(s+1)y! =(s+2)(s+ 1)! can be 
obtained using (28), namely G(s) = G7(-s) = (-s + 2)(-s+ 1). 


1.2.15 Causal and Noncausal Systems 
A causal system is a system that depends exclusively on past and current inputs. 


Example 8. The differential of x(f) with respect to ¢ is defined by 
x(t + dt) — x(t) 
dt 


x(t) = lim . Consider 
dt>0 


x(t) = Ax(t)+ Bw(t) (29) 


with Re{A,(A)} <0,7= 1, ..., 2. The positive sign of x(t) within (29) denotes a 

system that proceeds forward in time. This is called a causal system because it 

depends only on past and current inputs. 

Example 9. The negative differential of ¢(f) with respect to ¢ is defined by 

—E(t) = lim S(t) iy: a =f dt) 
t 


. Consider 
dt>0 


—C(t) = AC (t)+ C7 u(t) (30) 


with Re{/,(A)} =Re{4,(A")} <0, i= 1 ...n. The negative sign of é(t) within 
(30) denotes a system that proceeds backwards in time. Since this system depends 
on future inputs, it is termed noncausal. Note that Re{/,(A)}<0 implies 


“Science is simply common sense at its best that is, rigidly accurate in observation, and merciless to 
fallacy in logic. Thomas Henry Huxley 
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Re{A,(—A)} > 0. Hence, if causal system (21) — (22) is stable, then its adjoint (23) 
— (24) is unstable. 


1.2.16 Realising Unstable System Components 


Unstable systems are termed unrealisable because their outputs are not in £2 that 


is, they are unbounded. In other words, they cannot be implemented as forward- 
going systems. It follows from the above discussion that an unstable system 
component can be realised as a stable noncausal or backwards system. 


Suppose that the time domain system @ is stable. The adjoint system z= G"u 
can be realised by the following three-step procedure. 
(i) Time-reverse the input signal u(¢), that is, construct u(t), where t = T - ¢ 
is a time-to-go variable (see [12]). 
(ii) Realise the stable system G* 


C(t) = A'S (r)+C™u(r), (31) 
z(t) = B'¢(r)+D'u(r), (32) 


with ¢(T)=0. 
(iii) Time-reverse the output signal z(z), that is, construct z(¢). 


The above procedure is known as noncausal filtering or smoothing; see the 
discrete-time examples described in [13]. Thus, a combination of causal and non- 
causal system components can be used to implement an otherwise unrealisable 
system. This approach will be exploited in the realisation of smoothers within 
subsequent chapters. 


Example 10. Suppose that it is required to realise the unstable system 
G(s) =G;'(s)G,(s) over an interval [0, 7], where G,(s)=(s+l)' and 


G,(s) =(s+2)'. This system can be realised using the processes shown in Fig. 2. 


Time- 
reverse 


Time- 
reverse 
transpose 


transpose 


Fig. 2. Realising an unstable G(s) = Gi! (s)G,(s) . 


“Time is what prevents everything from happening at once.” John Archibald Wheeler 
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1.2.17 Power Spectral Density 


The power of a voltage signal applied to a 1-ohm load is defined as the squared 
value of the signal and is expressed in watts. The power spectral density is 
expressed as power per unit bandwidth, that is, W/Hz. Consider again a linear, 
time-invariant system y = Gw _ and its corresponding transfer function matrix 
G(s). Assume that w is a zero-mean, stationary, white noise process with 
E{w(t)w' (r)}_ = Qd(t-1), in which 6 denotes the Dirac delta function. Then 


®,,,(s), the power spectral density of y, is given by 


A 
® (8) =GOG"(s), (33) 

which has the property ®,,(s) = ®,,(-s). 
The total energy of a signal is the integral of the power of the signal over time and 


is expressed in watt-seconds or joules. From Parseval’s theorem (9), the average 
total energy of y(t) is 


[®,,@)ds = [7 y@Pat = Ol, = £0" Or}, (34) 


which is equal to the area under the power spectral density curve. 


1.2.18 Spectral Factorisation 


Suppose that noisy measurements 


2(t) = y(t) + v(t) (35) 
of a linear, time-invariant system @ , described by (21) - (22), are available, 


where v(t) € R‘ is an independent, zero-mean, stationary white noise process 
with E{v(t)v’ (r)} = Rd(t—r). Let 


®_.(s)=GOG"(s)+R (36) 


denote the spectral density matrix of the measurements z(/). Spectral factorisation 
was pioneered by Wiener (see [4] and [5]). It refers to the problem of 
decomposing a spectral density matrix into a product of a stable, minimum-phase 
matrix transfer function and its adjoint. In the case of the output power spectral 


density (36), a spectral factor A(s) satisfies A(s)A”(s) = ®_(s). 


“Science may be described as the art of systematic over-simplification.” Karl Raimund Popper 


G. A. Einicke, Smoothing, Filtering and Prediction: Estimating 15 
the Past, Present and Future (2™ ed.), Prime Publishing, 2019 


The problem of spectral factorisation within continuous-time Wiener filtering 
problems is studied in [14]. The roots of the transfer function polynomials need to 
be sorted into those within the left-hand-plane and the right-hand plane. This is an 
eigenvalue decomposition problem — see the survey of spectral factorisation 
methods detailed in [15]. 


Example 11. In respect of the observation spectral density (36), suppose that G(s) 
= (s + 1)' and Q = R = 1, which results in ©, (s) = (- s* + 2)(- s* + 1)'. By 
inspection, the spectral factor A(s) = (s+V2\(s +1)" is stable, minimum-phase 
and satisfies A(s)A”(s) = ®_(s). 


1.3. Minimum-Mean-Square-Error Filtering 


1.3.1 Filter Derivation 


Now that some underlying frequency-domain concepts have been introduced, the 
Wiener filter [4] — [6] can be described. A Wiener-Hopf derivation of the Wiener 
filter appears in [4], [6]. This section describes a simpler completing-the-square 
approach (see [14], [16]). Consider a linear time-invariant system having a 
transfer function matrix G(s) = C2(sJ— A)! B + Do. Let Yo(s), Ws), V(s) and Z(s) 
denote the Laplace transforms of the system’s output, process noise, measurement 
noise and observations, respectively, so that 


Z(s) =Y,(s)+V(s). (37) 


Consider also a fictitious reference system having the transfer function Gi(s) = 
C\(sI — A)'!B + D, as shown in Fig. 3. The problem is to design a filter transfer 


function H(s) to calculate estimates ¥(s) = H(s)Z(s) of Yi(s) so that the energy 
in Y(s)¥" (s)ds of the estimation error 
oe 
Y¥(s) = Yi(s)-Y¥,(s) (38) 


is minimised. It follows from the configuration of Fig. 3 that ¥ (s) is generated by 


a . (39) 


Y(s)= -[H(s) HG,(s) G(s] jn 


“Science is what you know. Philosophy is what you don't know.” Earl Bertrand Arthur William 
Russell 
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Fig. 3. The s-domain general filtering problem. 


The error power spectrum density matrix is denoted by ®,,(s) and given by the 


covariance of Y(s) , that is, 


©, (3) = 7(s)7"(s) 
= [H(s) HGL)-G)] 5 al Z®) | 


DOUG (sy =Gi(s) (40) 
= G.QG'"(s)— GOGH" (s)— HG,QG," (s)+ HAA" H(z) , 
where 
AA" (s) = G,OGS' (s) +R (41) 


is the spectral density matrix of the measurements. The quantity A(s) is called a 
spectral factor, which is unique up to the product of an inner matrix. Denote 
A~"(s) =(A”)'(s). Completing the square within (40) yields 


®,,(8) = G.QG"(s)- G.QG;' (AA")'G,0G;"(s) 
+ (HA(s)-G,OG A" (s) (HA(s)— GGA" (s))". (42) 


It follows that the total energy of the error signal is given by 
joo jo : 
[/ ®p(s)ds= J" G.OGi" (s)- 6,06! (AA")'G,0G" (s)ds 


+] a (HA(s)-G,OGi' A" (s))\(HA(s)-G,OGS A" (s))"ds. (43) 


“There is an astonishing imagination, even in the science of mathematics.” Francois-Marie Arouet de 
Voltaire 
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The first term on the right-hand-side of (43) is independent of H(s) and represents 
je : * 

a lower bound of i _®,,(s)ds . The second term on the right-hand-side of (43) 
Lae 


may be minimised by a judicious choice for H(s). 


Theorem 2: The above linear time-invariant filtering problem with by the 
measurements (37) and estimation error (38) has the solution 


H(s)=GOGHA"A\(s). (44) 


which minimises in ® ,,(s)ds . 
an 


Proof: The result follows by setting HA-G,QG}'A" (s) = 0 within (43). 


By Parseval’s theorem, the minimum mean-square-error solution (44) also 


minimises lle, 


The solution (44) is unstable because the factor G{’(A“”)'(s) possesses right- 
hand-plane poles. This optimal noncausal solution is actually a smoother, which 
can be realised by a combination of forward and backward processes. Wiener 
called (44) the optimal unrealisable solution because it cannot be realised by a 
memory-less network of capacitors, inductors and resistors [4]. 


The transfer function matrix of a realisable filter is given by 
H(s)={G,QG(A"y't A“(s), (45) 


in which { }+ denotes the causal part. A procedure for finding the causal part of a 
transfer function is described below. 


1.3.2 Finding the Causal Part of a Transfer Function 


The causal part of transfer function can be found by carrying out the following 
three steps. 

(i) Ifthe transfer function is not strictly proper, that is, if the order of the 
numerator is not less than the degree of the denominator, then perform 
synthetic division to isolate the constant term. 

(ii) Expand out the (strictly proper) transfer function into the sum of stable 
and unstable partial fractions. 

(iii) The causal part is the sum of the constant term and the stable partial 
fractions. 


“The important thing in science is not so much to obtain new facts as to discover new ways of thinking 
about them.” William Henry Bragg 
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Incidentally, the noncausal part is what remains, namely the sum of the unstable 
partial fractions. 


Example 12. Consider G(s) = (s’—’)(s?-a’)" with a, 8 < 0. Since Gis) 
possesses equal order numerator and denominator polynomials, synthetic division 
is required, which yields Go(s) = 1+ (a@-f°)(s?-a@’)". A partial fraction 
expansion results in 


(@?-B) 05a '(@?-B?) — 0.5a'(a? - B’) 
(s*-a’) (s+a@) (s—@) 
Thus, the causal part of G(s) is {G(s)}s = 1 — 0.5a'(a@’*—f*)(s—a)'. The 


noncausal part of G(s) is denoted as {G(s)}. and is given by {G(s)}- = 
0.5a‘(a’ — B’)(s—a)"'. It is easily verified that G(s) = {G(s)}++ {G(s)}.. 


Fig. 4. The s-domain output estimation problem. 


1.3.3. Minimum-Mean-Square-Error Output Estimation 


In output estimation, the reference system is the same as the generating system, as 
depicted in Fig. 4. The simplification of the optimal noncausal solution (44) of 
Theorem 2 for the case G(s) = G2(s) can be expressed as 


Ho,(s)=G,QG)'A “A\(s) 
= G,0G;' (AA")"(s) 
= (AA” — R)(AA”)"'(s) 
=I-RA“A(s). (46) 


The optimal causal solution for output estimation is [14] 


Ho,(8)={G,9GA"} A'(s) 


“Science is the topography of ignorance.” Oliver Wendell Holmes 
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=I-R{A"\ A“(s) 
+ 
=1—R'?A"(s). (47) 


When the measurement noise becomes negligibly small, the output estimator 
approaches a short circuit, that is, 


lim |Hoe(s)|=1- (48) 


R>0,s>' 


The observation (48) can be verified by substituting AA“ (s) = G,QG{‘(s) into 


(46). This observation is consistent with intuition, that is, when the measurements 
are perfect, filtering will be superfluous. 


Example 13. Consider a scalar output estimation problem, where G2(s) = 
(s—a)', @ =-1,Q=1 and R = 0.0001. Then G,OG{'(s) = O(-s*+a@’)' and 
AA" (s) = — (Rs? + Ra? + Q) (-s?+a@’)", which leads to A(s) = R'?(s + 
Ja +Q/R)\(s—a)". Therefore, G,OGs' (A”)'(s) = 

Q (-s-@) 


= Q 
(s—a@)(-s—@) R'2(-s+Ja? +O0/R)— R'?(s—a)(-s + fa? +O/R) 


common pole and zero were cancelled. Expanding into partial fractions and taking 
the causal part results in 


in which a 


Q 
R'?(-s+,fa? +Q/R) Po 


(s—a@) 


{G,0G; (A")"(s)}, = 
and 


Hog(s) = {G,0GHA-"), 07a) = Ste +O/R) 


s+ a°>+Q/R) 
Substituting @=-1, Q = 1 and R = 0.0001 yields H(s) = 99(s+100)". By 


: ; 99 rae i 
inspection, |H (9) aan: which illustrates the low measurement noise 


lim 
s—>0 
asymptote (48). Some sample trajectories from a simulation conducted with 6, = 
0.001 s are shown in Fig. 5. The input measurements are shown in Fig. 5(a). It can 
be seen that the filtered signal (the solid line of Fig. 5 (b)) estimates the system 
output (the dotted line of Fig. 5(b)). 


“Science is the systematic classification of experience.” George Henry Lewes 
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(@) 


1 1 
0 0.05 0.1 0.15 0.2 0.25 0.3 
Time, s 


s 1 i iE L 1 
1) 0.05 01 0.15 0.2 0.25 03 0.35 0.4 0.45 0.5 
Time, s 


Fig. 5. Sample trajectories for Example 13: (a) measurement, (b) system output (dotted 
line) and filtered signal (solid line). 


1.3.4 Minimum-Mean-Square-Error Input Estimation 


In input estimation problems, it is desired to estimate the input process w(f), as 
depicted in Fig. 6. This is commonly known as an equalisation problem, in which 
it is desired to mitigate the distortion introduced by a communication channel 
G2(s). The simplification of the general noncausal solution (44) of Theorem 2 for 
the case of Ga(s) = J results in 


H,(s)=OGJA“A'(s). (49) 


Equation (49) is known as the optimum minimum-mean-square-error noncausal 
equaliser [17]. Assume that: G2(s) is proper, that is, the order of the numerator is 
the same as the order of the denominator, and the zeros of G2(s) are in the left- 
hand-plane. Under these conditions, when the measurement noise becomes 
negligibly small, the equaliser estimates the inverse of the system model, that is, 


lim H,,(s) = G;'(s). (50) 


The observation (50) can be verified by substituting AA”(s) = G,QG{'(s) into 


(49). In other words, if the channel model is invertible and signal to noise ratio is 
sufficiently high, the equaliser will estimate w(t). When measurement noise is 


“All of the biggest technological inventions created by man - the airplane, the automobile, the 
computer - says little about his intelligence, but speaks volumes about his laziness.” Mark Raymond 
Kennedy 
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present the equaliser no longer approximates the channel inverse because some 
filtering is also required. In the limit, when the signal to noise ratio is sufficiently 
low, the equaliser approaches an open circuit, namely, 


: S 1 
o lim, Ae (s)| 0. (51) 
The observation (51) can be verified by substituting O = 0 into (49). Thus, when 
the equalisation problem is dominated by measurement noise, the estimation error 
is minimised by ignoring the data. 


Fig. 6. The s-domain input estimation problem. 


1.4 Chapter Summary 


Continuous-time, linear, time-invariant systems can be described via either a 
differential equation model or as a state-space model. Signal models can be 
written in the time-domain as 


dt 


dq” q” d 
= |b +b +...+5,—+), t). 
| ame sa ‘dt so ) 


dq" q”! d 
a, He +4, oo +...+a,—+a, |v) 


Under the time-invariance assumption, the system transfer function matrices exist, 
which are written as polynomial fractions in the Laplace transform variable 


bs" +b, js" +...4+bs+b, 


Y(s)= |i) = G(s)W(s). 


n n-1 
a,S' +a, 8 +..+4S+ a, 


Thus, knowledge of a system’s differential equation is sufficient to identify its 
transfer function. If the poles of a system’s transfer function are all in the left- 
hand-plane then the system is asymptotically stable. That is, if the input to the 
system is bounded then the output of the system will be bounded. 


“Read Euler, read Euler, he is our master in everything.” Pierre-Simon Laplace 
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The optimal solution minimises the energy of the error in the time domain. It is 
found in the frequency domain by minimising the mean-square-error. The main 
results are summarised in Table 1. The optimal noncausal solution has unstable 
factors. It can only be realised by a combination of forward and backward 
processes, which is known as smoothing. The optimal causal solution is also 
known as the Wiener filter. 


ASSUMPTIONS MAIN RESULTS 
E{w()} = E{Ms)} Eis 
E{V(s)} =0. Efw(t)w"()} = 2 
2 E{Ms)W"(s)} =O > Oand G,(s)=C,(sI— A) 'B+D, 
By, Efv(tvO} = E{Ws)V"(s)} =R > 0 i SS 
3 5 are known. A, B, C1, C2, D; and D, G, (s) C; (sI— A) B+ D, 
) 2 are known. G(s) and G,(s) are 
a stable, i.c., Re{A(A)} <0. 
A(s) and A'\(s) are stable, i.e., the 
les and f A(s) are in th 
¢ |e AA" (s) = GOG!(s)+R 
a4 SS 
GS 
g - H(s) = GQG;'(A")'A"'(s) 
8. 
: H(s)={G,QG;'(A"y'} A'(s) 
3 


Table 1. Main results for the continuous-time general filtering problem. 


In output estimation problems, C; = C2, Di = Do, that is, Gi(s) = Go(s) and when 
the measurement noise becomes negligible, the solution approaches a short circuit. 
In input estimation or equalisation, C; = 0, Di = J, that is, Gi(s) = J and when the 
measurement noise becomes negligible, the optimal equaliser approaches the 
channel inverse, provided the inverse exists. Conversely, when the problem is 
dominated by measurement noise then the equaliser approaches an open circuit. 


“Madam, I have come from a country where people are hanged if they talk.” Leonard Euler 
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1.5 Problems 


Problem 1. Find the transfer functions and comment on stability of the systems 
having the following polynomial fractions. 

(a) ~+7p+l2y=wWw+w-2w. 

(b) ¥+1p-20y =W+5w+6w. 

(c) ¥+11p+30y =w-7Tw+l2w. 

(d) y-13y+42y = w+9w+4 20w. 

(e) y-15p+56y = W+11W+30W. 


Problem 2. Find the transfer functions and comment on the stability for systems 
having the following state-space parameters. 


[-7 -12 1 
(a) A= ,B=|_|, C=[-6 -14] and D=1. 
Hake - 0 


(b) A= a o | B= |g]- C=k2 26] and D=1, 


| 1 0 0 
f-11 -30 1 

(c) A= , B=|_|, C=[-18 -18] and D=1. 
| 1 0 0 
[13 42 1 

(d) A= oo |. 2-[4]- ¢=[2 -22] and D=1. 


[-15 -56 1 
(e) A= , B=|_ |, C=[-4 -26] and D=1. 
| 1 0 0 


Problem 3. Calculate the spectral factors for ®,,(s)=GOG"(s)+R_ having the 
following models and noise statistics. 

(a) G(s) =(s+1)', O=2 and R= 1. 

(b) G(s) =(s+2)', O=5 and R= 1. 

(c) G(s) =(s+3)',O=7and R=1. 

(d) G(s) =(s+4)', O=9 and R= 1. 

(e) G(s) =(s+5)',O=l1landR=1. 


Problem 4. Calculate the optimal causal output estimators for Problem 3. 


Problem 5. Consider the error spectral density matrix 
®,,(s) =[HA-G,QG;'(A")'][HA-G,QG'(A")"}"(s) 
+(G,0G"" -G,OGS' (AA")'G,0G" ((s). 
(a) Derive the optimal output estimator. 
(b) Derive the optimal causal output estimator. 


“Nothing shocks me. I'm a scientist.” Harrison Ford 
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(c) Derive the optimal input estimator. 
Problem 6 [16]. In respect of the configuration in Fig. 2, suppose that 


-1 0 0O 25 
A=|0 -—2 0}, B=} 25 Gell 2 1; C= 1 1],D=0,Q=1 
0 oO -3 —25 
and R = 1. Show that the optimal causal filter is given by 
H(s) =—(16.9s* + 86.55 +97.3)(s° + 8.6487 + 30.35 +50.3)'. 
Problem 7 [18]. Suppose that G,QGS' (s) = ey and R(s) = 1. Show that 


the optimal causal filter for output estimation is given by 
H,(s) = (4s + 60)(s? +175 +60)". 


1.6 Glossary 


The following terms have been introduced within this section. 


R The space of real numbers. 

R’ The space of real-valued n-element column vectors. 

t The real-valued continuous-time variable. For example, t € 
(-o0,00) and ¢ € [0,0) denote -0o <t<oand0<t<~o, 
respectively. 

w(t)e R" A continuous-time, real-valued, n-element stationary 
stochastic input signal. 

w The matrix of w(d) all time, i.e., w=[ w(-), ..., w(co)]. 

y=Gw The output of a linear system GY that operates on an input 
signal w. 

A,B,C, D Time-invariant state space matrices of appropriate 


dimension. The system Y is assumed to have the realisation 
x(t) = Ax(t) + Bw), y(t) = Cx(t) + Dw(A) in which w(f) is 
known as the process noise or input signal. 


v(t) A stationary stochastic measurement noise signal. 
o(t) The Dirac delta function. 
OandR Time-invariant covariance matrices of stochastic signals w(f) 


and v(t), respectively. 


Ss The Laplace transform variable. 
Ys) The Laplace transform of a continuous-time signal y(f). 
G(s) The transfer function matrix of a system @ . For example, the 


transfer function matrix of the system x(t) = Ax(‘) + Bw(t), y(t) = 
Cx(t) + Dw(A) is given by G(s) = C(s!— A)'B + D. 
(v, w) The inner product of two continuous-time signal vectors v 


“Facts are not science — as the dictionary is not literature.” Martin Henry Fischer 
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I, 


Lo 
Ai(A) 
Re { Xi(A)} 


Asymptotic 
stability 


G" 


ca) 
®_.(s) 
A(s) 


G(s) 
G"s) 
{G(s)}+ 


H(s) 
Hok(s) 


7, 71E(S ) 


and w which is defined by (y, w) = fe v'w dt. 


The 2-norm of the continuous-time signal vector w which is 


defined by ||, = /(w.w) = fw" wat 


The set of continuous-time signals having finite 2-norm, 
which is known as the Lebesgue 2-space. 

The i eigenvalues of A. 

The real part of the eigenvalues of A. 

A linear system @ is said to be asymptotically stable if its 


output y € £2 for any w € L2. If Re{A,(A)} are in the left- 


hand-plane or equivalently if the real part of transfer 
function’s poles are in the left-hand-plane then the system is 
stable. 

The adjoint of G . The adjoint of a system having the state- 
space parameters {A, B, C, D} is a system parameterised by 
{— AT, — CT, B™, D™}. 

The adjoint (or Hermitian transpose) of the transfer function 
matrix G(s). 

The spectral density matrix of the measurements z. 

The spectral factor of ®_(s) which satisfies AA“(s) = 
GOG" (s) +Rand A“ (s) = (A”)"(s). 

Inverse of the transfer function matrix G(s). 

Inverse of the adjoint transfer function matrix G%(s). 

Causal part of the transfer function matrix G(s). 

Transfer function matrix of the minimum mean-square-error 
solution. 

Transfer function matrix of the minimum mean-square-error 
solution specialised for output estimation. 

Transfer function matrix of the minimum mean-square-error 
solution specialised for input estimation. 


1.7. References 


{1] O. Neugebauer, A history of ancient mathematical astronomy, Springer, Berlin and New 
York, 1975. 

[2] C. F. Gauss, Theoria Motus Corporum Coelestium in Sectionibus Conicis Solem Ambientum, 
Hamburg, 1809 (Translated: Theory of the Motion of the Heavenly Bodies, Dover, New 
York, 1963). 

[3] A. N. Kolmogorov, “Sur l’interpolation et extrapolation des suites stationaires”, Comptes 
Rendus. de l’Academie des Sciences, vol. 208, pp. 2043 — 2045, 1939. 

[4] N. Wiener, Extrapolation, interpolation and smoothing of stationary time series with 


engineering applications, The MIT Press, Cambridge Mass.; Wiley, New York; Chapman & 


“Facts are stupid things.” Ronald Wilson Reagan 


26 


[5] 


[6 


[7] 


[8 


[9 


[10] 
[11] 


[12] 


[13] 


[14] 


[15] 
[16] 
[17] 


[18] 


Chapter 1 Continuous-Time, Minimum-Square-Error Filtering 


Hall, London, 1949. 

P. Masani, ““Wiener’s Contributions to Generalized Harmonic Analysis, Prediction Theory 
and Filter Theory”, Bulletin of the American Mathematical Society, vol. 72, no. 1, pt. 2, pp. 
73 — 125, 1966. 

T. Kailath, Lectures on Wiener and Kalman Filtering, Springer Verlag, Wien; New York, 
1981. 

M.-A. Parseval Des Chénes, Mémoires présentés a l'Institut des Sciences, Lettres et Arts, par 
divers savans, et lus dans ses assemblées. Sciences mathématiques et physiques (Savans 
étrangers), vol. 1, pp. 638 — 648, 1806. 

C. A. Desoer and M. Vidyasagar, Feedback Systems : Input Output Properties, Academic 
Press, N.Y., 1975. 

G. A. Einicke, “Asymptotic Optimality of the Minimum-Variance Fixed-Interval Smoother”, 
IEEE Transactions on Signal Processing, vol. 55, no. 4, pp. 1543 — 1547, Apr. 2007. 

T. Kailath, Linear Systems, Prentice-Hall Inc., Englewood Cliffs, New Jersey, 1980. 

D. J. N. Limebeer, B. D. O. Anderson, P. Khargonekar and M. Green, “A Game Theoretic 


Approach to H,, Control for Time-varying Systems”, S/4M Journal of Control and 


Optimization, vol. 30, no. 2, pp. 62 — 283, 1992. 

M. Green and D. J. N. Limebeer, Linear Robust Control, Prentice-Hall Inc, Englewood 
Cliffs, New Jersey, 1995. 

C. S. Burrus, J. H. McClellan, A. V. Oppenheim, T. W. Parks, R. W. Schafer and H. W. 
Schuessler, Computer-Based Exercises for Signal Processing Using Matlab, Prentice-Hall, 
Englewood Cliffs, New Jersey, 1994. 

U. Shaked, “A general transfer function approach to linear stationary filtering and steady 
state optimal control problems”, International Journal of Control, vol. 24, no. 6, pp. 741 — 
770, 1976. 

A. H. Sayed and T. Kailath, “A Survey of Spectral Factorization Methods”, Numerical 
Linear Algebra with Applications, vol. 8, pp. 467 — 496, 2001. 

U. Shaked, “H.—Minimum Error State Estimation of Linear Stationary Processes”, JEEE 
Transactions on Automatic Control, vol. 35, no. 5, pp. 554 — 558, May 1990. 

S. A. Kassam and H. V. Poor, “Robust Techniques for Signal Processing: A Survey”, 
Proceedings of the IEEE, vol. 73, no. 3, pp. 433 — 481, Mar. 1985. 

A. P. Sage and J. L. Melsa, Estimation Theory with Applications to Communications and 
Control, McGraw-Hill Book Company, New York, 1971. 


“All science is either physics or stamp collecting.” Baron William Thomson Kelvin 


G. A. Einicke, Smoothing, Filtering and Prediction: Estimating 27 
the Past, Present and Future (2™ ed.), Prime Publishing, 2019 


2. Discrete-Time, Minimum-Mean-Square- 
Error Filtering 


2.1 Introduction 


This chapter reviews the solutions for the discrete-time, linear stationary filtering 
problems that are attributed to Wiener [1] and Kolmogorov [2]. As in the 
continuous-time case, a model-based approach is employed. Here, a linear model 
is specified by the coefficients of the input and output difference equations. It is 
shown that the same coefficients appear in the system’s (frequency domain) 
transfer function. In other words, frequency domain model representations can be 
written down without background knowledge of z-transforms. 


In the 1960s and 1970s, continuous-time filters were implemented on analogue 
computers. This technology has been largely discontinued for two main reasons. 
First, analogue multipliers and op amp circuits exhibit poor performance 
whenever (temperature-sensitive) calibrations become out of date. Second, 
updated software releases are faster to turn around than hardware design iterations. 
Continuous-time filters are now routinely implemented using digital computers, 
provided that the signal sampling rates and data processing rates are sufficiently 
high. Alternatively, continuous-time model parameters may be converted into 
discrete-time and differential equations can be transformed into difference 
equations. The ensuing discrete-time filter solutions are then amenable to more 
economical implementation, namely, employing relatively lower processing rates. 


The discrete-time Wiener filtering problem is solved in the frequency domain. 
Once again, it is shown that the optimum minimum-mean-square-error solution is 
found by completing the square. The optimum solution is noncausal, which can 
only be implemented by forward and backward processes. This solution is actually 
a smoother and the optimum filter is found by taking the causal part. 


The developments rely on solving a spectral factorisation problem, which requires 
pole-zero cancellations. Therefore, some pertinent discrete-time concepts are 
introduced in Section 2.2 prior to deriving the filtering results. The discussion of 
the prerequisite concepts is comparatively brief since it mirrors the continuous- 
time material introduced previously. In Section 2.3 it is shown that the structure of 
the filter solutions is unchanged — only the spectral factors are calculated 
differently. 


“If we value the pursuit of knowledge, we must be free to follow wherever that search may lead us. 
The free mind is not a barking dog, to be tethered on a ten foot-chain.” Adlai Ewing Stevenson Jr. 


28 Chapter 2 Discrete-Time, Minimum-Square-Error Filtering 


2.2 Prerequisites 
2.2.1 Spaces 


A discrete-time, real-valued stochastic process is denoted as w, = [Wy, Wry, +s 


w,,] € R" for integer time step k € (co, 0). Suppose that w,, vz are scalars i.e., 


Wh ve € R. The set of wy and wy over all & are denoted by row vectors w = [W-x, 
sey Woo], and v = [v-0, ..., Vo]. The inner product (v, w) of two discrete-time vectors 


v and w is defined by 


(vy, w) = vw, « (1) 


The 2-norm or Euclidean norm of a discrete-time vector process w, is defined as 


bh, = yon) = J 


0 
f. 
k 

k=-20 


ww, . The square of the 2-norm, that is, |p, = (w"w) = 


> ww, is commonly known as energy of the signal w. 
k=-0 


Consider instead wz, ve € R", n> 1. Then the set of we and wx over all k are 


denoted by matrices w = [W.«, ..., Wo], and v =[v-«, ..., Vo]. The inner product of v 
Wi 
and w is defined by (v,w) = trace(v'w) = )° >) vw, - 
i k=-o0 
Example 1. Consider w, = [Wy. Wy, -- Wx) over k e€ [1, N] and 
Wi Wa ot Ww 
Wr, Wo. 7° Wow : ; : : 
Wa) ao . |. It is easily verified that the energy of w is (w,w) = 
Wi Wi Wi 
T ne 2 2 2 2 2 2 2 2 2 — 
trace(wew) = (Wy tWyyte.We) + (Wig FW FW) Fee Wy Ft Wy Few) = 


nN 
>i w, » which is reverts to (1) for n= 1. 
i k=l 


The Lebesgue 2-space is denoted by /¢, and is defined as the set of discrete-time 
processes having a finite 2-norm. Thus, w € /, means that the energy of w is 
bounded. See [3] for more detailed discussions of spaces and norms. 


“To live effectively is to live with adequate information.” Norbert Wiener 
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2.2.2 Discrete-time Polynomial Fraction Systems 


Consider a linear, time-invariant system @ that operates on a scalar input signal 


w=[wi, ..., wv] € R™ and produces a scalar output signal y = [y1, ..., yv] € R™, 
i.e., y= Gw. Such as system can be realised by the difference equation 
AV pen + Ip Vena, Fo FAV pa FV = Op Wem + On We +... +d Ww, + by, 5 (2) 


0 —m+1 


where do, ..., Qn, Do, ..., Dn are real-valued constant coefficients, with a, # 0 and 
zero initial conditions. Difference equations of the form (2) are called discrete- 
time polynomial systems. 


Example 2. The difference equation yx, = 0.lx, + 0.2 x-7 + 0.3y%-7 specifies a 
system in which the coefficients are ap = 1, a; =— 0.3, bp = 0.2 and b; = 0.3. Note 
that y, is known as the current output and y;-; is known as a past output. 


2.2.3. The Z-Transform of a Discrete-time Sequence 


The two-sided z-transform of a discrete-time process, yz, is denoted by Y(z) and is 
defined by 


Y= Y yz", (3) 


k=-00 


where z = e and j =  -1. Given a process yx with z-transform Y(z), yz can be 
calculated from Y(z) by taking the inverse z-transform of (z), 


: a Y(z)z"'dz . (4) 


iv 


Vem Qn jr 


Theorem 1 Parseval’s Theorem: 


Y(z)['dz. (5) 


0 1 eg 
Fobra 


That is, the energy in the time domain equals the energy in the frequency domain. 
2.2.4 Polynomial Fraction Transfer Functions 


In the continuous-time case, a system’s differential equations lead to a transfer 
function in the Laplace transform variable. Here, in discrete-time, a system’s 


“There is no philosophy which is not founded upon knowledge of the phenomena, but to get any profit 
from this knowledge it is absolutely necessary to be a mathematician.” Daniel Bernoulli 
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difference equations lead to a transfer function in the z-transform variable. 
Applying the z-transform to both sides of (2) yields the difference equation 


n 


(a,2" 44,5207 +..4a2 0+ ay) ¥(z) 


= (ae +b 2" $n tB 27 +by)W(2) 6) 
Therefore 
b Zz b gl } ea ae 
Y — m ml 1 W 
ie 2 "4a, 2 7 +..4a,z 4 4 (z) (7) 
=G(z)W(z), 
where 
b Zz” +b - go +..¢b27 45 
G(z) -| m — m-1 — 1 — | () 
a,zZ +4,_)2 +..¢Q4Z +a) 


is known as the transfer function of the system. It can be seen that knowledge of 
the system differential equation (2) is sufficient to identify its transfer function (8). 


2.2.5 Poles and Zeros 


The numerator and denominator polynomials of (8) can be factored into m and n 
linear factors, respectively, to give 


_ by (2 = BYEZ = By)---(Z = Bn) (9) 


a,(Z— @,)(Z — @,)...(z — a,) ; 


G(z) 


The numerator of G(z) is zero when z = f;, i = 1 ... m. These values of z are called 
the zeros of G(z). Zeros inside the unit circle are called minimum-phase whereas 
zeros outside the unit circle are called non-minimum phase. The denominator of 
G(z) is zero when z= ai, i= 1 ... n. These values of z are called the poles of G(z). 


Example 3. Consider a system described by the difference equation yz + 0.3y4-1 + 
0.04 yi-2 = we + 0.5wy.1. It follows from (2) and (8) that the corresponding transfer 
function is given by 


1+0.5z' — 2 4+05z  —  — 2(z+0.5) 
1+0.32'+0.0427 2° +0.3z+0.04 (z-0.1)(z + 0.4) 


G(z) = 


which possesses poles at z = 0.1, — 0.4 and zeros at z= 0, — 0.5. 


“A mathematician is a blind man in a dark room looking for a black cat which isn't there.” Charles 
Robert Darwin 
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2.2.6 Polynomial Fraction Transfer Function Matrix 


In the single-input-single-output case, it is assumed that w(z), G(z) and y(z) e R. 
In the multiple-input-multiple-output case, G(z) is a transfer function matrix. For 
example, suppose that w(z) € R”, y(z) € R’, then G(z) € R””, namely 


G(z) G(s) . G,,(2) 


G,(z) Gy(s) 


G(z)= , (10) 


G,, (z) a Gym (Z) 


where the components G;(z) have the polynomial transfer function form within (8) 
or (9). 


2.2.7 State-Space Transfer Function Matrix 
The transfer function matrix (10) can be written in the state-space representation 
G(z)=C(zl- A)'B+D, (11) 


where A E Rr”, B E R™™ : C E R*" and D E Re ; 


Example 4. For a state-space model with A = —0.5, B = C= 1 and D = 0, the 
transfer function is G(z) =(z'-0.5)". 


—0.33 —0.04 1 
Example 5. For state-space parameters =| ; é | a-| | 


by" 
C=[0.2 -0.04] and D = 1, the use of Cramer’s rule, that is, |: = 
c 


z(z +0.5) 
(z-0.1)(z+0.4) © 


d -bl| : 
: , yields the transfer function G(z) = 
ad—bc|-c a 


2.2.8 State-Space Realisation 


The state-space transfer function matrix (11) can be realised as a discrete-time 
system G:R”” > RR” 


X,,, = Ax, + Bw, , (12) 
y, = Cx, + Dw, , (13) 


“T do not like it, and I am sorry I ever had anything to do with it.” Erwin Rudolf Josef Alexander 
Schrodinger 
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where x, € R” is a state vector. This system is depicted in Fig. 1. The A = {ai,}, 
B, = {bij}, C = {cij} and D = {dj;} are known as a state matrix, an input mapping, 
an output mapping and a direct feed-through matrix, respectively. For ease of 
understanding, the state-space model (15) — (16) can be written as 


x Ar An WG bi Dim || 
: = = 5 : 
LX, Api a, 1 — Qa in x, k b,. ae Dam | Wn k 
yy C1 Cin |] % diy din || 
= 2] + : : 
LY dk Coa Ss Con x, k dyn _ dn [Win k 


Fig. 1. Discrete-time state-space system. 


It is assumed that w;, is a zero-mean, stationary process with E{w,w,} = Q. In most 


applications, discrete-time implementations are desired, however, the polynomial 
fraction transfer function or state-space transfer function parameters may be 
known in continuous-time. Therefore, two methods for transforming continuous- 
time parameters to discrete-time are set out below. 


2.2.9 The Bilinear Approximation 


Transfer functions in the z-plane can be mapped exactly to the s-plane by 
substituting z =e" , where s = jw and Ts is the sampling period. Conversely, the 
substitution 


1 
s =—log(z 
r g(z) 


Ss 
3 5 7 
a2 214355) (5) +75) a: (14) 
T,)z+l 3\z+4+1 5\z+l 7\z+4+1 


“The beginning of knowledge is the discovery of something we do not understand.” Frank Patrick 
Herber 
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can be used to map s-plane transfer functions into the z-plane. The bilinear 
transform is a first order approximation to (14), namely, 


arabs (15) 


T,Lz+1 


Example 6. Consider the continuous-time transfer function H(s) = (s + 2)! with 
Ts = 2. Substituting (15) yields the discrete-time transfer function H(z) = (3z + 1) 
’ The higher order terms within the series of (14) can be included to improve the 
accuracy of converting a continuous-time model to discrete time. 


2.2.10 Discretisation of Continuous-time Systems 


he discrete-time state-space parameters, denoted here by {Ap, Bp, Cp, Dp, Op, 
Rp}, can be obtained by discretising the continuous-time system 


X(t) = Ac (1) + Bow), (16) 
V(t) = Cpx(t) + Dew(t), (17) 
2(t) = ya) + v0) , (18) 


where Efw(t)w'(c)} = Q.d(t-7) and Ef{v(t)v"(r)} = R.d(t-7). Premultiplying 


(16) by e*" and recognising that (exo) = ext) — e“'A.x(1) yields 
d -Act -Act 
Pe “x(t)) =e “'B.w(t). (19) 
Integrating (19) results in 
e x(t) -e x(t,) =| eB.w(a)dt (20) 
and hence 
x(t) = e* x(t) + ent : ee B.w(a)dt 
= et (1) + i eB wrt (21) 


is a solution to the differential equation (16). Suppose that x(f) is available at 
integer k multiples of 7;. Assuming that w(f) is constant during the sampling 
interval and substituting to = 73, t = (k+1)T;, into (21) yields 


“In the fields of observation, chance favours only the mind that is prepared.” Louis Pasteur 
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(K+DT, 


x(k+ DT) =e" (RT +§ eM OB dew(kT,). (22) 


AT, 


With the identifications x, = x(AT;) and wz = w(kTs) in (22), it can be seen that 


A, =e , (23) 
AMEE OB ge, (24) 


(kIT, 


B, =[ 


e 
AT, 


The z within the definite integral (24) varies from kT; to (k+1)7s. For a change of 
variable 4 = (k+/)T; — 1, the limits of integration become i = 7, and 4 = 0, which 
results in the simplification 


B,=-[ ne Boda 


=| et BMA. (29) 
Denoting E{w,w,} = Q,6, and using (25) it can be shown that [4] 
Op =] 'e**BOBleda. (26) 
The exponential matrix is defined as 
eae eAen-, 9 het 
et =1+ At oe + ie (27) 
which leads to 
2 3 4 
A, =1+ AT. AT) + (AT, ) ; (AT,) tied (28) 
3! 4! 
2: 273 3m4 
per Ale AE AT (29) 
: 2! 3! 4! 
ee (ABO BE + BoOBCAC)T, 
QO, = B Q-BeT, + T ahhh (30) 


It is common practice ([4] — [6]) to truncate the above series after terms linear in 
T;. Some higher order terms can be retained in applications where parameter 


accuracy is critical. Since the limit as N > of 7." / N! is 0, the above series are 


valid for any value of 7;. However, the sample period needs to be sufficiently 
small, otherwise the above discretisations will be erroneous. According to the 
Nyquist-Shannon sampling theorem, the sampling rate is required to be at least 


“We are more easily persuaded, in general, by the reasons we ourselves discover than those which are 
given to us by others.” Blaise Pascal 
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twice the highest frequency component of the continuous-time signal. In respect of 
(17), the output map may be written as 


y(AT,) = Cox(kT,) + Dow(KT,) (31) 
and thus 
CSC. (32) 
D, =D- (33) 


Following the approach of [7], it is assumed that the continuous-time signals are 
integrated between samples, for example, the discretised measurement noise is 


1 : : . : 
V(kKT.) = 7a Le v(r)dt . Then the corresponding measurement noise covariance 
is 
1 fp Gen, 1 34 
DT? kr. Le (34) 


Ss 


In some applications, such as inertial and satellite navigation [8], the underlying 
dynamic equations are in continuous-time, whereas the filters are implemented in 
discrete-time. In this case, any underlying continuous-time equations together with 
(28) — (30) can be calculated within a high rate foreground task, so that the 
discretised state-space parameters will be sufficiently accurate. The discrete-time 
filter recursions can then be executed within a lower rate background task. 


2.2.11 Asymptotic Stability 


Consider a discrete-time, linear, time-invariant system Q@ that operates on an 
input process w and produces an output process y. The system @ is said to be 
asymptotically stable if the output remains bounded, that is, y € C2, for any input 
w € f. Two equivalent conditions for GY to be asymptotically stable are as 
follows. 
(i) The i eigenvalues of the system’s state matrix are inside the unit circle, 
that is, for A; of (11), |A, (A)| <l. 
(ii) The i poles of the system’s transfer function are inside the unit circle, that 
is, for a; of (9), |a@,| <1. 


Example 7. A state-space system having A = - 0.5, B = C = 1 and D = 0 is stable, 
since A(A) = 0.5 is in the unit circle. Equivalently, the corresponding transfer 
function G(z) = (z + 0.5)! has a pole at z = - 0.5 which is inside the unit circle and 
so the system is stable. 


“Eighty percent of success is showing up.” (Woody) Allen Stewart Konigsberg 
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2.2.12 Adjoint Systems 


Consider again a linear system @ be a linear system operating on an input w € 
R”’” and produces an output Gw ¢ R’”. Then G", the adjoint of @, is the 
unique linear system which produces an output signal G“a@ © R”’, such that 
<a, Gw> = <G"a, w>, for alla ¢ R”” and we R””. The following 
derivation is a simplification of the time-varying version that appears in [9]. 


Lemma 1 (State-space representation of an adjoint system): Suppose that a 
discrete-time linear time-invariant system G is described by 


X,4, = Ax, + Bw,, (35) 
y, = Cx, + Dw,, (36) 


with xo = 0. The adjoint G" is the linear system having the realisation 


Gy SA Cpe ae, (37) 
B, =-B'G, +D'a,, (38) 
with 6, =0. 
Proof: The system (35) — (36) can be written equivalently 
Ce ee ( ae (39) 
C DiwOl LO 
with x9 = 0. Thus 
oe llc’ ole) 
can aie D || w 
= dicts Neat Ye (Ax, +Bw,)+ Dal (Cx, + Dw, ) (40) 
Pa x CS x 
D’ |lal’ lw 
t a,w> (41) 


where G" is given by (37) — (38). 


“There is something fascinating about science. One gets such wholesale returns of conjecture out of 
such a trifling investment of fact.” Samuel Langhorne Clemens aka. Mark Twain 
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A B 
Thus, the adjoint of a discrete-time system having the parameters iE D is a 


A’ -C" 

-BT pD’ 
(G")" = G. The adjoint of the transfer function matrix G(z) is denoted as G"(z) 
and is defined by the transfer function matrix 


system with parameters | Adjoint systems have the property 


Gi(z) = G(z’). (42) 


Example 8. Suppose that a system @ has the state-space parameters A = - 0.5 
and B = C = D = 1. From Lemma 1, an adjoint system has the state-space 
parameters A =—0.5, B = C=~—1, D=1 and the corresponding transfer function is 
G¥(z) = 1+ (z¢!+ 0.5)! = (3z + 2)(z + 2)', which is unstable and non-minimum- 
phase. Alternatively, the adjoint of G(z) = 1 + (z+ 0.5)! =(z + 1.5)(z + 0.5)! can 
be obtained using (42), namely, G"(z) = G"(z-1) = (3z+ 2)(z+ 2y1. 


2.2.13 Causal Systems 


A causal system is a system whose output depends exclusively on past and current 
inputs and outputs. 


Example 9. Consider xj+1 = 0.3x% + 0.4x4-1 + wx. Since the output x;+1 depends only 
on past states x,, x1, and past inputs w,, this system is causal. 


Example 10. Consider xx = 0.3xi+1 + 0.4x% + we+1. Since the output x; depends on 
future outputs x,+1 and future w;+1 inputs, this system is non-causal. 


2.2.14 Realising Unstable System Components 


Unstable system components are termed unrealisable because their outputs are not 
in £2, that is, they are unbounded. In other words, unstable systems cannot produce 
a useful output. However, an unstable causal component can be realised as a stable 
non-causal or backwards component. Consider the system G (35) — (36) in which 
the eigenvalues of A all lie outside the unit circle. In this case, a stable adjoint 
system # = G" a can be realised by the following three-step procedure. 

(i) Time-reverse the input signal o4, that is, construct a, where t= N - kis a 

time-to-go variable. 
(ii) Realise the stable system 7” 


“T've lost my faith in science.” Ruth Elizabeth (Bette) Davis 
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Coa = AL or C'a, ? (43) 
B, =B'C,+D'a,, (44) 
with ¢, =0. 
(iii) Time-reverse the output signal f., that is, construct f,. 


Thus if a system consists of a cascade of stable and unstable components, it can be 
realised by a combination of causal and non-causal components. This approach 
will be exploited in the realisation of smoothers subsequently. 


Example 1. Suppose that it is desired to realise the system G(z) = Gy! (z)G,(z), in 
which G\(z) = (z + 0.6)! G3! (z) = 2(0.92 +1)", that is, G,(z) =(z+0.9)"'. This 
system can be realised using the processes shown in Fig. 2. 


Time-reverse Time-reverse 
transpose transpose 


Fig. 2. Realising an unstable G(z) = Gy (z)G, (z). 


2.2.15 Power Spectral Density 


Consider again a linear, time-invariant system y = G w and its corresponding 
transfer function matrix G(z). Then ®,,(z), the power spectral density of y, is 
given by 


®,,(z)=GOG" (z), (45) 
which has the property ®,,.(z) = ®,, (z'). From Parseval’s Theorem (5), the 
average total energy of y(t) is given by 

[ior ®,, @d2 = [7 |x, [dk = [yO]; =£0" OO}, a 


which equals the area under the power spectral density curve. 


“Knowledge rests not upon truth alone, but on error also.” Carl Gustav Jung 
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2.2.16 Spectral Factorisation 


To avoid confusion with the z-transform variable, denote the noisy measurements 
of Wz) = G(z)w() by 


u(z) = y(z)+v(z), (47) 


where v(z) € R?’ is the z-transform of an independent, zero-mean, stationary, 
white measurement noise process with E{v Ae = Ré,,. Let 


®,,(z)=GOG" (z)+R (48) 
denote the spectral density matrix of the measurements u(f). A discrete-time 
transfer function is said to be minimum phase if its zeros lies inside the unit circle. 
Conversely, transfer functions having outside-unit-circle-zeros are known as non- 
minimum phase. 


Suppose that ®,,,(z) is a spectral density matrix of transfer functions possessing 
equal order numerator and denominator polynomials that do not have roots on the 
unit circle. Then the spectral factor matrix A(z) satisfies the following. 
(i) A(z) A%(Z) = Buz). 
(ii) A(Z) is causal, that is, the poles of A(z) are inside the unit circle. 
(iii) A'(z) is causal, that is, the zeros of A(z) which are the poles of A(z) are 
inside the unit circle. 


The problem of spectral factorisation within discrete-time Wiener filtering 
problems is studied in [10]. The roots of the transfer function polynomials need to 
be sorted into those inside the unit circle and those outside the unit circle. Spectral 
factors can be found using Levinson-Durbin and Schur algorithms, Cholesky 
decomposition, Riccati equation solution [11] and Newton-Raphson iteration [12]. 


Example 2. Applying the Bilinear Transform (15) to the continuous-time low- 
pass plant G(s) = (s + 1)! for a sample frequency of 2 Hz yields G(z) = 0.2(z+1)(z- 
0.6)'. With QO = R = 1, the measurement spectral density (48) is 
o: = (1.08z — 0.517) ‘ (—0.517z +1.08) 

(z—0.6) (—0.6z —1.0) 
0.517)(z — 0.6)! has inside-unit-circle-poles and zeros that satisfy A(z)A“(z) = 
®,,,(Z). 


. By inspection, A(z) = (1.082 — 


“Tt is still an unending source of surprise for me how a few scribbles on a blackboard or on a piece of 
paper can change the course of human affairs.” Stanislaw Ulam 
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Example 3. Consider the high-pass plant G(z) = 4.98(z — 0.6)(z + 0.99)! and O = 
(5.39z — 2.58) ‘ (-2.58z + 5.39) 
(z+0.99) (0.99z —1.0) 


stable, minimum phase spectral factor is A(z) = (5.39z — 2.58)(z + 0.99)", since it 
has inside-unit-circle-poles and zeros. 


. Thus the 


R= 1. The spectral density is ®,,(z) = 


2.2.17 Calculating Causal Parts 
Suppose that a discrete-time transfer function has the form 


Ge=a+ ¥ rp see 


lal 274; jatfpj p12 ~ b 


=cor Giucp(Z) ah Goucp(Z); 


(49) 


n 


where co, di, e) € R, Gincp(z) = » 


i=l,a;|<I Z—-4; 


is the sum of partial fractions having 


m 
inside-unit-circle-poles and Gouep(z) = DF 
jalypje1 2 9; 


Gj 


is the sum of partial fractions 


having outside-unit-circle-poles. Assume that the roots of G(z) are distinct and do 
not lie on the unit circle. In this case the partial fraction coefficients d; and e; 
within (49) can be calculated from the numerator and denominator polynomials of 


G(z) via d,=(z-4,)G()|,_, and e, =(z—b,)G(z) 


eb," Previously, in 


continuous-time, the convention was to define constants to be causal. This is 
consistent with ensuring that the non-causal part of the discrete-time transfer 
function is zero at z = 0. Thus, the non-causal part of G(z), denoted by {G(z)}-, is 
obtained as 


{G@)}- = Gouc(Z) — Goucp(9) (50) 


and the causal part of G(z), denoted by {G(z)}+ ,is whatever remains, that is, 


{G(z)}+= Gz) — {G@)}- 
=cot Giucp(Z) + Goucp(0). (51) 


Hence, the causal part of transfer function can be found by carrying out the 
following three steps. 
(i) Ifthe transfer function is not strictly proper, that is, if the order of the 
numerator not less than the degree of the denominator, perform synthetic 
division to extract the constant term. 


“T shall try to correct errors when shown to be errors; and I shall adopt new views so fast as they shall 
appear to be true views.” Abraham Lincoln 
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(ii) Expand out the (strictly proper) transfer function into the sum of partial 
fractions (49). 

(iii) Obtain the causal part from (51), namely, take the sum of the constant 
term, the partial fractions with inside-unit-circle-poles and the partial 
fractions with outside-unit-circle-poles evaluated at z= 0. 

3z4+3.2 


Zz +2.6z+1.2 


Example 4. Consider the strictly proper transfer function G(z) = 


3z4+3.2 A! ‘ 
(z+0.6)(z+2) 2z+0.6 z+2 


2 ae, eee mad dOO le = 1 a _ z+1.6 
z+2 z+2 z+0.6 z+0.6 


verified that G(z) = {G(z)}++ {G@}-. 


. It follows from (50) and (51) that {G(z)}- = 


, respectively. It is easily 


22? +8.27+5.6 


Example 5. Consider the proper transfer function G(z) = 


2° +2.6z41.2 | 
Carrying out synthetic division results in G(z) = 2 + : + z . It follows 
z+0.6 z+2 
from (50) and (51) that {G(z)}- mee ea {G(z)}+= +1+ 
Zzt+2 z+2 z+0.6 


3z+2.8 F 
2= Be respectively. 


z+0. 


2.3. Minimum-Mean-Square-Error Filtering 


2.3.1 Filter Derivation 


This section derives the optimal non-causal minimum-mean-square-error solution 
for the problem configuration of Fig. 3. The derivation is identical to the 
continuous-time case which is presented in Chapter 1. It is assumed that the 
parameters of the transfer function G2(z) = C2(zI — A)'B + D2 are known. Let 
Y2(z), Wz), V(z) and U(z) denote the z-transform of a system’s output, process 
noise, measurement noise and observations, respectively. Then it follows from 
(47) that the z-transform of the measurements is 


U(z) = Y,(z) + V(z). (52) 


Consider a fictitious reference system Gi(z) = C\(zI — A)'B + D; as shown in Fig. 
3. The problem is to design a stable filter transfer function H(z) to calculate 


“T think anybody who doesn't think I'm smart enough to handle the job is underestimating.” George 
Walker Bush 
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estimates Y(z) = H(z)U(z) of Yi(z) so that the energy  Ycy¥"(z)dz of the 
ee 
estimation error 
YZ) = ¥@) - KH (53) 


is minimised. It can be seen from Fig. 3 that the estimation error is generated by 
the system 


= V(z 
Y(z)=-[H(z) H,G(z)-G, el ire (54) 


Fig. 3. The general z-domain filtering problem. 


The error power spectrum density matrix is given by the covariance of Y(z), that 


is, 
®,,(z) =¥(z)¥" (z) 
R 0 Vz 
=[H(2) HG2)-G{2)]| | oe, | 


0 Q|| GfH"(z)-G/(z) 
= G,OG;" (z)-G,0G;' H" (z) — HG,QG;" (z)+ HAA" H"(z), (55) 
where 
AA" (z) = G,OG3! (z) +R (56) 


is the spectral density matrix of the measurements. Completing the square within 
(55) yields 


®;;(z) = GQG;" (z)- G.QG,' (AA")'G,0G/"(z) 
+ (HA(z)-G,QG;'A"" (z) (HA(z) - G,OG A" (z))", (67) 


“We are all born ignorant, but must work hard to remain stupid.” Benjamin Franklin 
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in which A~“(z)=(A”)'(z). It follows that the total energy of the error signal 
can be expressed as 


iT 


} ie ®,,(z)dz 


= [* ur GOG}" (2)-G.OG! (AA")'G,0G" (2)dz 


af } se (HA(z)-G,OG! A“ (z)(HA(z)-G.OGZA*(z))"dz. (58) 


The first term on the right-hand-side of (58) is independent of H(z) and represents 
a lower bound of i wr D ;,(z)dz . The second term on the right-hand-side of (109) 
may be minimised by a judicious choice for H(z). 
Theorem 1: The optimal solution for the above linear time-invariant estimation 
problem with measurements (103) and error (104) is 

H(z)=G,OG/A“"A'(z), (59) 


jwT 


e 
which minimises } wr D5 (z)dz. 
—e 


Proof: The result follows by setting HA(z)—G,0G;'A"(z) equal to the zero 
matrix within (58). 

By Parseval’s theorem, the minimum mean-square-error solution (59) also 
minimises . The solution (59) is non-causal because the factor possesses outside- 


unit-circle poles. This optimal non-causal solution is actually a smoother, which 
can be realised by a combination of forward and backward processes. 


The transfer function matrix of the optimal causal solution or filter is obtained by 
setting the setting the causal part of equal to the zero matrix, resulting in =, that 


is =, which implies 


H(z)={G,0G;(A")"} A'(). (60) 


2.3.2 Output Estimation 


In output estimation, it is desired to estimate the output Y2(z) from the 
measurements U(z), in which case the reference system is the same as the 


“He who knows nothing is closer to the truth than he whose mind is filled with falsehoods and errors.” 
Thomas Jefferson 
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generating system, as shown in Fig. 4. The optimal non-causal solution (59) with 
G(z) = G2(z) becomes 


Ho (2)=G,OG/A"A"(z). (61) 


Substituting G,OG;! (z) = AA“ (z) — R into (61) leads to the alternative form 


Ho, (z) = (AAY — RY(AA"Y"(z) 
=I1-RA“A'(z). (62) 


Fig. 4. The z-domain output estimation problem. 


The solutions (61) and (62) are non-causal since G'(z) and A“/(z) are non-causal. 
The optimal smoother or non-causal filter for output estimation is obtained by 
substituting Gi(z) = G2(z) into (60), namely, 


Hoe(z)={G,0G;A"} A*(z). (63) 


An alternative form arises by substituting GOG"(z) = AA"(z) — R into (63), which 
results in 
Ho, (z) = (A(z) RA}, A"(2) 
=I1-R{A"} A(z). (64) 
In [10], it is recognised that {A“(z)}. = lim A(z), which is equivalent to {A"%(z)}, 
= A*(0). It follows that ee 


Ho, (z)=1-RA"(0)A"(z), (65) 


which eliminates the need for calculating causal parts. 


Example 6. Consider G2(z) = (z + 0.2)(z + 0.5)! together with R = O = 1. The 
spectral factor is A(z) = (1.43z + 0.489)(z + 0.5), which leads to G,OG{'A“" (z) 


= (0.22? + 1.04z + 0.2)(0.489z? + 1.672 + 0.716)! and {G,OG!A~" (z)}, = (0.734z 


“There is much pleasure to be gained from useless knowledge.” Bertrand Arthur William Russell 
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+ 0.14)(z + 0.5)!. Hence, from (63), Hox(z) = (0.513z + 0.098)(z + 0.341)!. The 
same solution can be calculated using A““(0) = 0.698 within (65). 


When the measurement noise becomes negligibly small, the output estimator 
approaches a short circuit, that is, 


lim. |Hor(2)| =f; (66) 
R>0,e!"" 30 


The above observation can be verified by substituting R = 0 into (65). This 
asymptote is consistent with intuition, that is, when the measurements are perfect, 
output estimation will be superfluous. 


Example 7. Substituting R = 0.001 within Example 15 yields the filter H(z) = 
(0.999z + 0.2)(z + 0.2)-1, which illustrates the low measurement noise asymptote 
(66). 


2.3.3. Input Estimation 


In input estimation or equalisation problems, G2(z) is known as the channel model 
and it is desired to estimate the input process w(t), as depicted in Fig. 5. The 
simplification of the optimum non-causal solution (59) for the case of Gi(z) = J is 


Hy, (z)=QG/A"A"(z), (67) 


Assume that: the channel model G2(z) is proper, that is, the order of the numerator 
is the same as the order of the denominator; and that the channel model Gy(z) is 
stable and minimum phase, that is, its poles and zeros are inside the unit circle. 
The causal equaliser for proper, stable, minimum-phase channels is obtained by 
substituting Gi(z) =J into (60) 


Ay(z)= {OG A" h At (z) 
= OG} (0)A" (0)A (2). (68) 


Under the above assumptions, the causal equaliser may be written equivalently as 


H(z) = {G,'G,0G,'A-"}, A(z) 


= {G,'(AA" — RJA" 3 A7(z) (69) 
= Gy'(I-R{A"} A(z) 
= G;'(I-RA" (O)A"(2)) (70) 


Thus, the equaliser is equivalent to a product of the channel inverse and the output 
estimator. It follows that when the measurement noise becomes negligibly small, 
the equaliser estimates the inverse of the system model, that is, 


“Time is a great teacher, but unfortunately it kills all its pupils.” Louis Hector Berlioz 
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lim Hy,(2) = G,'(2). os 


The above observation follows by substituting R = 0 into (69). In other words, if 
the channel model is invertible and signal to noise ratio is sufficiently high, the 
equaliser will estimate w(t). When measurement noise is present then the solution 
trades off channel inversion and filtering. In the high measurement noise case, the 
equaliser approaches an open circuit, that is, 


lim |H,,(z)|=0. (72) 
0->0,e!"? +0 
The above observation can be verified by substituting AA” = R into (70). Thus, 
when the equalisation problem is dominated by measurement noise, the estimation 
error is minimised by ignoring the data. 


Fig. 5. The z-domain input estimation problem. 


Example 8. Consider the high-pass plant G(s) = 100(s + 0.1)(s + 10)! . 
Application of the bilinear transform for a sample frequency of 2 Hz yields G2(z) 
= (29.2857z — 27.8571)(z + 0.4286). With O = 1 and R = 0.001, the spectral 
factor is A(z) = (29.2861z + — 27.8568)(z + 0.4286)!. From (67), Hiz(z) = (z + 
0.4286)(29.2861z — 27.8568)", which is high-pass and illustrates (71). 


Example 9. Applying the bilinear transform for a sample frequency of 2 Hz to the 
low-pass plant G2(z) = (s + 10)(s + 0.1)! results in Gx(z) = (3.41462 — 1.4634)(z — 
0.9512)!. With O = 1 and R = 0.001, the spectral factor is A(z) = (3.4151z 


+1.4629)(z — 0.9512)'. From (67), HiAz) = (2 — 0.9512)(3.4156z + 1.4631), 
which is low pass and is consistent with (71). 


2.4 Chapter Summary 
Systems are written in the time-domain as difference equations 


BV pen + In aVe-nst to FBV pa FV, = Oy, Wyim + On We-mar to FOW 4 + Oy, » 


“If I have made any valuable discoveries, it has been owing more to patient attention, than to any other 
talent.” Sir Isaac Newton 
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which can expressed as polynomial transfer functions in the z-transform variable 


ae eae 


al = W(z)= G(z)W(z). 
a,Z+Q,,Z2" +..4QZ +a, 


It can be seen that knowledge of a system’s differential equation is sufficient to 
identify it’s transfer function. The optimal Wiener solution minimises the energy 
of the error (and the mean-square-error), and the main results are summarised in 
Table 1. The noncausal (or smoother) solution has unstable factors and can only 
be realised by a combination of forward and backward processes. 


ASSUMPTIONS MAIN RESULTS 
Efwe = E{W(z)} = Efve} = 
E{V(z)} 0. Efwwy} = G,(z) =C,(zI- A) 'B+D, 
E{W@)W"z)} = Q > 0 and G,(z) =C,(zl - A)'B+D, 
5 Ely, } = ELV@V"@)} = R> 
F & 0 are known. A, B, Ci, C2, Di 
be and D2 are known. Gi(z) and 
ms G2(z) are stable, i.e., Ai(A)| < 1. 
7 A(z) and A‘!(z) are stable, i.e., 
_ Bs the poles and Zeros of A(z) are AA# (z)= G,oG! (z)+R 
Sa inside the unit circle. 
FE 
aS 
2 e H(z) =G,QG;' (A"y'A"(z) 
6 .g 


H(z) ={GQG;'(A")"} A‘) 


Causal 
solution 


Table 1. Main results for the discrete-time general filtering problem. 


It is noted that {A”(z)}, = limA(z) = A“(0), which can simplify calculating 
causal parts. For example, in output estimation problems where G(z) = G2(z), the 
minimum-mean-square-error solution is Hos(z) = I — RA“ (0)A™'(z). In the 


single-input-single-output case, when the measurement noise becomes negligible, 
the output estimator approaches a short circuit. Conversely, when the single-input- 


“All our knowledge has its origin in our perceptions.” Leonardo da Vinci 
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single-output problem is dominated by measurement noise, the output estimator 
approaches an open circuit. 


In input estimation problems, Gi(z) = /. If the channel model is invertible, the 
optimal causal equaliser is given by H,,(z)=QG;'(0)A“ (0)A'(z). When the 
measurement noise becomes negligible, that is, A''(z)~G;(z), the optimal 


equaliser approaches the channel inverse. Conversely, when the problem is 
dominated by measurement noise, the equaliser approaches an open circuit. 


2.5. Problems 


Problem 1. Consider the error spectral density matrix 
®,,.(z) =[HA-G OG) (A")"][HA-G,OGi (A")']"(z) 
+[G,0G\" —G,OG,' (AA")'G,0G,"(z). 
(a) Derive the optimal non-causal solution. 
(b) Derive the optimal causal filter from (a). 
(c) Derive the optimal non-causal output estimator. 
(d) Derive the optimal causal filter from (c). 
(e) Derive the optimal non-causal input estimator. 
(f) Derive the optimal causal equaliser assuming that the channel inverse exists. 


Problem 2. Derive the asymptotes for the following single-input-single-output 
estimation problems. 

(a) Non-causal output estimation at R = 0. 

(b) Non-causal output estimation at O = 0. 

(c) Causal output estimation at R = 0. 

(d) Causal output estimation at O = 0. 

(e) Non-causal input estimation at R = 0. 

(f) Non-causal input estimation at O = 0. 

(g) Causal input estimation at R = 0. 

(h) Causal input estimation at O = 0. 


Problem 3. In respect of the output estimation problem with G(z) = (z — B)(z— ay! 
,a=— 0.3, B =—0.5 and O=1, verify the following. 

(a) R= 10 yields H(z) = (0.0948z + 0.0272)(z + 0.4798)". 

(b) R= 1 yields H(z) = (0.5059z + 0.1482)(z + 0.3953). 

(c) R=0.1 yields A(z) = (0.9094 1z + 0.2717)(z + 0.3170)". 

(d) R=0.01 yields H(z) = (0.9901z + 0.2969)(z + 0.3018)". 

(e) R= 0.001 yields A(z) = (0.9990z + 0.2997)(z + 0.3002)". 


Problem 4. In respect of the input estimation problem with G(z) = (z— B)(z— a)", 
a =— 0.9, B =—0.1 and Q=1, verify the following. 
(a) R= 10 yields H(z) = (z + 0.1)(11.5988z + 1.9000)1. 


“Statistics is the main of all inaccurate studies” Edmond de Goncourt 
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(b) R= 1 yields H(z) = (z + 0.1)(2.4040z + 1.0000)". 
(b) R=0.1 yields H(z) = (z + 0.1)(1.2468z + 0.9100)". 
(d) R=0.01 yields H(z) = (z + 0.1)(1.0381z + 0.9010)". 
(c) R= 0.001 yields H(z) = (z + 0.1)(1.043z + 0.9001)". 


2.6 Glossary 


The following terms have been introduced within this section. 


k 


Wk € R" 


w 


y=Gw 


A,B,C, D 


Asymptotic 
stability 


Ts 


The integer-valued time variable. For example, k € (-«, «) 
and k € (0, 0) denote -0 <k<o and 0<k< ©, respectively. 
A discrete-time, real-valued, n-element stochastic input 
signal. 

The set of w; over a prescribed interval. 

The output of a linear system G@ that operates on an input 
signal w. 

Time-invariant state space matrices of appropriate dimension. 
The system @ is assumed to have the realisation x4+) = Axg + 
Bwz, ye = Cxe + Dwe in which w, is known as the process 
noise or input signal. 

A stationary stochastic measurement noise signal. 

The Kronecker delta function. 

Time-invariant covariance matrices of stochastic signals wz: 
and v;, respectively. 

The z-transform of a continuous-time signal yx. 

The transfer function matrix of the system ¢& . For example, 
the transfer function matrix of the system x+; = Axx + Bwe, 
ye= Cxe + Dwy is given by G(z) = C(zI— A)'B + D. 

The inner product of two discrete-time signals v and w which 


is defined by (v,w)= Yo ving. 
k=-<0 


The 2-norm of the discrete-time signal w which is defined by 


Ph = Oro) = wt 


The set of continuous-time signals having finite 2-norm, 
which is known as the Lebesgue 2-space (see [3]]). 

A linear discrete-time system @ is said to be asymptotically 
stable if its output y € ¢, for any we ¢,. Ifthe real parts of 
the state matrix eigenvalues are inside the unit circle or 
equivalently if the real part of transfer function’s poles are 
inside the unit circle then the system is stable. 

Sample period. 


“If your result needs a statistician then you should design a better experiment.” Baron Ernest 


Rutherford 
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Gg" 


G"(z) 


@ wy (z) 
A(z) 
G\(z) 
Gz) 
{G(z)}+ 
H(z) 
Hoz(z) 


7 71E(Z) 


2.2 


[1] 


[2] 


[3] 
[4] 


[5] 
[6] 


[7] 


[8] 
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The adjoint of G . The adjoint of a system having the state- 
space parameters {A, B, C, D} is a system parameterised by { 
AT, -CT, —B’, D". 

The adjoint (or Hermitian transpose) of the transfer function 
matrix G(z). 

The spectral density matrix of the error Y(z). 


The spectral factor of ©,,(z) which satisfies AA“(z) = 
GOG"(z) + R. For brevity denote A“(z) = (A”)' (z). 

The inverse of the transfer function matrix G(z). 

The inverse of the adjoint transfer function matrix G"(z). 

The causal part of the transfer function matrix G(z). 

Transfer function matrix of the minimum mean-square-error 
solution. 

Transfer function matrix of the minimum mean-square-error 
solution specialised for output estimation. 

Transfer function matrix of the minimum mean-square-error 
solution specialised for input estimation. 
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3. Continuous-Time, Minimum-Variance 
Filtering 


3.1 Introduction 


Rudolf E. Kalman studied discrete-time linear dynamic systems for his master’s 
thesis at MIT in 1954. He commenced work at the Research Institute for 
Advanced Studies (RIAS) in Baltimore during 1957 and nominated Richard S. 
Bucy to join him in 1958 [1]. Bucy recognised that the nonlinear ordinary 
differential equation studied by an Italian mathematician, Count Jacopo F. Riccati, 
in around 1720, now called the Riccati equation, is equivalent to the Wiener-Hopf 
equation for the case of finite dimensional systems [1], [2]. In November 1958, 
Kalman recasted the frequency domain methods developed by Norbert Wiener and 
Andrei N. Kolmogorov in the 1940s to state-space form [2]. Kalman noted in his 
1960 paper [3] that generalising the Wiener solution to nonstationary problems 
was difficult, which motivated his development of the optimal discrete-time filter 
in a state-space framework. He described the continuous-time version with Bucy 
in 1961 [4] and published a generalisation in 1963 [5]. Bucy later investigated the 
monotonicity and stability of the underlying Riccati equation [6]. The continuous- 
time minimum-variance filter is now commonly attributed to both Kalman and 
Bucy. 


Compared to the Wiener Filter, Kalman’s state-space approach has the following 
advantages. 
e It is applicable to time-varying problems. 
e =©As noted in [7], [8], the state-space parameters can be linearisations of 
nonlinear models. 
e The burdens of spectral factorisation and pole-zero cancelation are 
replaced by the easier task of solving a Riccati equation. 
e [tis a more intuitive model-based approach in which the estimated states 
correspond to those within the signal generation process. 


“What a weak, credulous, incredulous, unbelieving, superstitious, bold, frightened, what a ridiculous 
world ours is, as far as concerns the mind of man. How full of inconsistencies, contradictions and 
absurdities it is. I declare that taking the average of many minds that have recently come before me ... I 
should prefer the obedience, affections and instinct of a dog before it.” Michael Faraday 
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Kalman’s research at the RIAS was concerned with estimation and control for 
aerospace systems which was funded by the Air Force Office of Scientific 
Research. His explanation of why the dynamics-based Kalman filter is more 
important than the purely stochastic Wiener filter is that “Newton is more 
important than Gauss” [1]. The continuous-time Kalman filter produces state 
estimates x(t) from the solution of a simple differential equation 


X(t) = A(DR(1) + K()(z() -—CMRM), 


in which it is tacitly assumed that the model is correct, the noises are zero-mean, 
white and uncorrelated. It is straightforward to include nonzero means, coloured 
and correlated noises. In practice, the true model can be elusive but a simple (low- 
order) solution may return a cost benefit. 


The Kalman filter can be derived in many different ways. In an early account [3], 
a quadratic cost function was minimised using orthogonal projections. Other 
derivation methods include deriving a maximum a posteriori estimate, using It6’s 
calculus, calculus-of-variations, dynamic programming, invariant imbedding and 
from the Wiener-Hopf equation [6] - [17]. This chapter provides a brief derivation 
of the optimal filter using a conditional mean (or equivalently, a least mean square 
error) approach. 


The developments begin by introducing a time-varying state-space model. Next, 
the state transition matrix is defined, which is used to derive a Lyapunov 
differential equation. The Kalman filter follows immediately from a conditional 
mean formula. Its filter gain is obtained by solving a Riccati differential equation 
corresponding to the estimation error system. Generalisations for problems 
possessing deterministic inputs, correlated process and measurement noises, and 
direct feedthrough terms are described subsequently. Finally, it is shown that the 
Kalman filter reverts to the Wiener filter when the problems are time-invariant. 


Fig. 1. The continuous-time system (G operates on the input signal w(t) and produces the output 
signal y(t). 


“A great deal of my work is just playing with equations and seeing what they give.” Paul Arien 
Maurice Dirac 
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3.1 Prerequisites 


3.1.1 The Time-varying Signal Model 


The focus initially is on time-varying problems over a finite time interval ¢t ¢€ [0, 
T]. A system G is assumed to have the state-space representation 


X(t) = A(t) x(t) + BOW) , (1) 
y(t) =COx(t) + DOW, @) 
where A(f) € R””, BA) e R””, CM € R’”, D® € R°™ and w(t) € R” isa 
zero-mean white process noise with E{w(Aw"(z)} = O(Ad(t — 1), in which d(f) is 
the Dirac delta function. This system in depicted in Fig. 1. In many problems of 
interest, signals are band-limited, that is, the direct feedthrough matrix, D(A), is 


zero. Therefore, the simpler case of D(t) = 0 is addressed first and the inclusion of 
a nonzero D(?) is considered afterwards. 


3.1.2. The State Transition Matrix 


The state transition matrix is introduced below which concerns the linear 
differential equation (1). 


Lemma 1: The equation (1) has the solution 


x(t) = @(t,t,)x(t)) + I O(t,s)B(s)w(s)ds , (3) 


where the state transition matrix, D(t,t,), satisfies 


: ® 
(6,15) = PEO = AQo1,), a) 
dt 
with boundary condition 
O(t,t) =1. (5) 


ae Differentiating both sides of (3) and using Leibnitz’s rule, that is, 


ne nr T) da(t) aH) 
Alsen ae a ieee dt iar 6d era Lg , B)—, gives 


“Life is good for only two things, discovering mathematics and teaching mathematics." Siméon Denis 
Poisson 
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X(t) = O(t,t, )x(ty) + I @(t,r)B(r) w(t) dt + W(t, t)B(t)w(t) (6) 


Substituting (4) and (5) into the right-hand-side of (6) results in 


X(t) = A(t) (oe. raul) + [/ OU r)B(e)w(e)dr +B(t)w(0). (7) 


3.1.3 The Lyapunov Differential Equation 


The mathematical expectation, E {x(t)x"(z)} of x()x"(z), is required below, which 
is defined as 


E{x()x" (2)} = [7 xx" (0) fx Ox" a), (8) 


where f..(x(¢)x’(r)) is the probability density function of x(#x7(z). A useful 
property of expectations is demonstrated in the following example. 


Example 1. Suppose that x(t) is a stochastic random variable and h(f) is a 
continuous function, then 


a {[.oxox" (earl = [AOE O)x" (hat (9) 


To verify this, expand the left-hand-side of (9) to give 


E| [ n(x(t)x" (car = [-a@x@x" (oat fo (x(t)x" (7) dx(0) 
= } . if h(t)x(t)x" (r) f,, (x(t)x" (r)dtdx(t). (10) 


Using Fubini’s theorem, that is, ["[” g(x,y)dxdy = J” [" ¢(x.y)dydx, within (10) 


results in 


E| oxox" (yar = ih ie h(t)x(t)x" (r) f,, (x(t)x" (2))dx(t)dt 


“Tt is a mathematical fact that the casting of this pebble from my hand alters the centre of gravity of the 
universe.” Thomas Carlyle 
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= PAO]. Ox" Of CxOx" (Oded (11) 


The result (9) follows from the definition (8) within (11). 


— 0 oO 
0 0° satisfies the identity [ O(t)dt =1. In 


the foregoing development, use is made of the partitioning 


The Dirac delta function, d(¢) = i 


[sat = [doar = 05. (12) 


Lemma 2: In respect of equation (1), assume that w(t) is a zero-mean white 
process with Efw(t)w"(t)} = O(6(t — 1) that is uncorrelated with x(t), namely, 
Efw(t)x"(to)}. = 0. Then the covariances P(t,t) = Efx(t)x™()} and P(t,t) = 


“E {x(t)x" (c)}_ satisfy the Lyapunov differential equation 


P(t,r) = A(t)P(t,7) + P(t,t)A’ (7) + B(D)O(t)B" (2). (13) 


Proof: Using (1) within  Etx(x" (0) = Efx(t)x" (rc) + x(t)x"(c)} yields 


P(t, 7) = E{A(t)x(t)x’ (r) + B()w(t)x" (r)} 
+E{x(t)x" (r)A" (r) + x(t)w" (7) B" (r)} 
= A(t)P(t,r) + P(t,r)A’ (tr) 
+E{B(t)w(t)x! (c)} + Ef{x(t)w" (r)B" (r)}. (14) 


It follows from (1) and (3) that 


E{B(t)w(t)x' (c)} = BIE {w(t)x' (0) @(t,0)} 
+ BODE} [/ w(ow" (BT (1)@C, n)dt| 
= BU) E{w(t)x" (0)@(t,0)} . 
+B(t) [ E{w(t)w' (1)}B" (r)@(t, 7) de . (15) 


“Genius is two percent inspiration, ninety-eight percent perspiration.” Thomas Alva Edison 
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The assumptions Efw(t)x"(to)} = 0 and Efw(Hw"(2)? = O(6(t — 1) together with 
(15) lead to 


E{(B(t)w(t)x" (2)} = BDO(1) } : O(t—1)B" (tr) @(t,r)dt 
= 0.5B(t)O(t)B’ (t). (16) 


The above Lyapunov differential equation follows by substituting (16) into (14). 
In the case t = ¢, denote P(t,t) = E{x(t)x"(t)} and P(t,t) = EF {x(t)x' (t)}. Then 
the corresponding Lyapunov differential equation is written as 


P(t) = A(t)P(t) + P(A’ (t) + B(D)O(t)B" (t) . (17) 


3.1.4 Conditional Expectations 
The minimum-variance filter derivation that follows employs a conditional 


expectation formula, which is set out as follows. Consider a stochastic vector 
[x7(¢) y"(d)]" having means and covariances 


Alot Ly (18) 


ua pe 
for vor} Pb]. a 


yt)-y ae 


and 


respectively, where 2, = ie Suppose that it is desired to obtain an estimate of 
x(t) given y(t), denoted by E{x(t)| y(t}, which minimises E {(x(0) = 
E{x(t)| vO — Ef{x(t)| y(t)})"} . A standard approach (e.g., see [18]) is to 
assume that the solution for E{x(¢) | y(¢)} is affine to y(4), namely, 


Etx(t)| WO} = AvO) +, (20) 


where A and b are unknowns to be found. It follows from (20) that 


“As far as the laws of mathematics refer to reality, they are not certain; and as far as they are certain, 
they do not refer to reality.” Albert Einstein 
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EX(x(t)— E{x(t)| pYOYQO- Ext) | pO)" 
= E{x(t)x' (t)—x(t)y" (A — x(t)b" — Ay(O)x" (0) 
+ Ay(t)y" (t)A" — Ay(t)b" — bx" (t) + by (t)A" + bb". (21) 


Substituting E{x(#)x"()} = xx" + D, E{xMy(O} = 3" + L,,, E{w(Ox"O} = 
ye 2% Efy@y"(} = yy? + x, into (21) and completing the squares 
yields 


El (x(t)- Ex) | YOPAO- LAO | ypOY"} 


=(¥ - Ay—b)(x— Ay—b)" + [I ma" eal (22) 


yx 


The second term on the right-hand-side of (22) can be rearranged as 


Die Ly I -1 -1\T -1 
: aa E a [2] mer Yrs )2 yy (A- Lr) a aN ~ Bye ye . 


Thus, the choice 4 = £, 5° and b= x— Ay minimises (22), which gives 


y Wy 


Et{x(t)| yO}=*+Z,z,, (v(O-F) (23) 


and 


E{ (x(t) E{x(t) | vPOP(eO - LLY | VOY" } = Tye tT yBpZye- (24) 


The conditional mean estimate (23) is also known as the linear least mean square 
estimate [18]. An important property of the conditional mean estimate is 
established below. 


Lemma 3 (Orthogonal projections): In respect of the conditional mean estimate 
(23), in which the mean and covariances are respectively defined in (18) and (19), 
the error vector 


X(t) = x(t)— E{x() | yO}. (25) 
is orthogonal to y(t), that is, E{X(t)y"(t)} = 0. 
Proof [8],[18]: From (23) and (25), it can be seen that 


“Statistics: The only science that enables different experts using the same figures to draw different 
conclusions." Evan Esar 
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E{(®Q)- ERO) OO- LEVON } = £{(xO-F-2,2,00-H)(vO-5)'} 
= Ly - Dy 


xy yy 
=0. 


Sufficient background material has now been introduced for the finite-horizon 
filter (for time-varying systems) to be derived. 


3.2. The Continuous-time Minimum-Variance Filter 
3.2.1 Derivation of the Optimal Filter 


Consider again the linear time-varying system ¢:R” — R?’ having the state- 
space realisation 


X(t) = A(t)x() + BO) W(d) , (26) 
yt) = C(x), (27) 


where A(t), B(t), C(t) are of appropriate dimensions and w/(f) is a white process 
with 


E{w(d)} =0, E{w@w"(2)} = Od(t — 2). (28) 
Suppose that observations 
2(t) = y(t) + v(t) (29) 
are available, where (1) ¢ R” is a white measurement noise process with 
Et} =0, E{v(Ov'(O} = Rd(t— 2) (30) 
and 


Efw(tv"(o} =0. (31) 
The objective is to design a linear system A that operates on the measurements 
z(t) and produces an estimate p(t|t) = C(t)x(t|t) of (4) = C(Ax(t) given 


measurements at time ¢, so that the covariance E{}(t|t)}" (t|t)} is minimised, 


“T have hardly known a mathematician who was capable of reasoning”. Plato 
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where j(t|t) = y(t) — p(t|t) is the output estimation error. This output 
estimation problem is depicted in Fig. 2. 


It is desired that x(t|7t) and the estimate X(t |¢) of x(t) are unbiased, namely 


Etx(t)—X(t| 1} =0, (32) 
E{x(t|t)—X(t |} =0. (33) 


Fig. 2. The continuous-time output estimation problem. The objective is to find an estimate }(t|t) of 


y(t) which minimises E{(y(t)— $(t|))(v(O —F(t| OD)? . 


If x(t|¢) is a conditional mean estimate, from Lemma 3, criterion (32) will be 
met. Criterion (33) can be satisfied if it is additionally assumed that F {X(t |t)} = 
A(t)x(t|f), since this yields E{x(t|t) — X(t |1)} = A(t)(E{x(t) — x(t|H} =0. 


A(t) X(t | t 
Thus, substituting E alo) -| Ox( | 4 into (23), yields the conditional 
z(t) CO) x(t | 2) 


mean estimate 


K(t|t) = A(DS(t| H+ KO) (z(t) —C(t)X(t | t)) 
=(A()- KDC) R(t |N+ KOA), (34) 


where K(t) = E{x(t)z'(O}E {z(z"(0}"1. Equation (34) is known as the continuous- 
time Kalman filter (or the Kalman-Bucy filter) and is depicted in Fig. 3. This filter 
employs the state matrix A(f) akin to the signal generating model GY, which 
Kalman and Bucy call the message process [4]. The matrix K(f) is known as the 
filter gain, which operates on the error residual, namely the difference between the 
measurement z(t) and the estimated output C(t)x(t). The calculation of an 


optimal gain is addressed in the next section. 


“Art has a double face, of expression and illusion, just like science has a double face: the reality of 
error and the phantom of truth.” René Daumal 
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Fig. 3. The continuous-time Kalman filter which is also known as the Kalman-Bucy filter. The filter 
calculates conditional mean estimates x(¢|f) from the measurements z(/). 


3.2.2 The Riccati Differential Equation 


Denote the state estimation error by x(t|t) =x(t)— x(¢|t). It is shown below that 
the filter minimises the error covariance E{X(t|t)X’ (t|1)} if the gain is calculated 


as 


K(t) = P(t)C'(t)R'(t), (35) 


in which P(t) = E{X(t|t)X"(t|H} is the solution of the Riccati differential 


equation 


P(t) = A(t)P(t)+ P(t)A’ (t)— P()C’ (ROC) P(t) + BO()B"(t). —-(36) 


Lemma 4: In respect of the state estimation problem defined by (26) - (31), 
suppose that there exists a solution 


P(t) = P(t) >0 (37) 
for the algebratic Riccati equation (36) satisfying 


A(t) — P(t)C’ (t) > 0 (38) 


for all t in the interval [0,T]. Then the filter (34) having the gain (35) minimises 
P() = ER(t|)x"C|D}. 


Proof: Subtracting (34) from (26) results in 


“Somewhere, something incredible is waiting to be known.” Carl Edward Sagan 
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X(t|t) =(AM-K(OC@) E/N +BOwWO-KOV). (39) 


Applying Lemma 2 to the error system (39) gives 


P(t) =(AQ- KOC) PO) + PO(AD - KOC) 
+K(t)R(t)K' (t) + B(t)O(t)B’ (t) (40) 


which can be rearranged as 


P(t) = A()P(t) + P(A’ (1) + BOMB" () 
+(K(t)— PIC ()R() RO(K"(-R "(HNC OPO) 
+P()C (OR (OCOP) - (41) 


Setting P(t) equal to the zero matrix results in a stationary point at (35) which 
leads to (40). From the differential of (40) 


P(t) =(A()- POC (DR '(OC)) PO 
+P(t)(A™()- PCT ROC) (42) 


and it can be seen that P(t) > 0 provided that the assumptions (37) - (38) hold. 
Therefore, P(t) = E{X(t | t)X" (t| t)} is minimised at (35). 


The above development is somewhat brief and not very rigorous. Further 
discussions appear in [4] — [17]. It is tendered to show that the Kalman filter 
minimises the error covariance, provided of course that the problem assumptions 
are correct. In the case that it is desired to estimate an arbitrary linear combination 
Ci(4) of states, the optimal filter is given by the system 


R(t |t) = A(Hx(t|H+K() (z(t) —C() x(t | t)) ; (43) 
YO =C OM). (44) 


This filter minimises the error covariance C,(¢)P(t)C/ (t). The generalisation of 


the Kalman filter for problems possessing deterministic inputs, correlated noises, 
and a direct feed-through term is developed below. 


“The worst wheel of the cart makes the most noise.” Benjamin Franklin 
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3.2.3. Including Deterministic Inputs 


Suppose that the signal model is described by 


X(t) = A(t)x(t) + B)w(t) + M(t) (45) 
VN =CHxO)+70), (46) 


where y(t) and z(t) are deterministic (or known) inputs. In this case, the filtered 
state estimate can be obtained by including the deterministic inputs as follows 


R(t | 1) = A(DR(t|)+ KO (z(t) —C()X(t| 1) - z(t) + uD) (47) 
IO) =COXOY+ 24). (48) 


It is easily verified that subtracting (47) from (45) yields the error system (39) and 
therefore, the Kalman filter’s differential Riccati equation remains unchanged. 


Example 2. Suppose that an object is falling under the influence of a gravitational 
field and it is desired to estimate its position over [0, f] from noisy measurements. 
Denote the object’s vertical position, velocity and acceleration by x(f), x(t) and 
X(t) , respectively. Let g denote the gravitational constant. Then x(¢) = —g implies 
x(t) = x(0) — gt, so the model may be written as 


x(t) |__| x(0) (49) 
ba ‘ 4p lu o 
fx 
z(t) = ae +v(t), 


is a deterministic input 


0 1]. ; x(0)— gt 
where A = is the state matrix, (4) = 
0 0 -g 


and C= [1 0] is the output mapping. Thus, the Kalman filter has the form 
(| t R(t | t R(t t 
mee) -4/ 5 | Ac ay-c| | d + u(t), (50) 
X(t |t) x(t| t) x(t |t) 
Vt t)=C(Hx(t| 1), Gt) 
where the gain K is calculated from (35) and (36), in which BOB’ = 0. 


“T am tired of all this thing called science here. We have spent millions in that sort of thing for the last 
few years, and it is time it should be stopped.” Simon Cameron 
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3.2.4 Including Correlated Process and Measurement Noise 


Suppose that the process and measurement noises are correlated, that is, 


WO ray yr | 20 SO}. (52) 
[vol a ol} [Sa mun" cs 


The equation for calculating the optimal state estimate remains of the form (34), 
however, the differential Riccati equation and hence the filter gain are different. 
The generalisation of the optimal filter that takes into account (52) was published 
by Kalman in 1963 [5]. Kalman’s approach was to first work out the 
corresponding discrete-time Riccati equation and then derive the continuous-time 
version. 


The correlated noises can be accommodated by defining the signal model 
equivalently as 


X(t) = A(t) x(t) + BWA) + (A), (53) 


where 


A(t) = A(t) — B(NS()R" OC) (54) 


is a new state matrix, 


W(t) = w(t) — S(t)R'(t)v(t) (55) 


is anew stochastic input that is uncorrelated with v(4), and 


L(t) = BSR" (Oy) (56) 


is a deterministic signal. It can easily be verified that the system (53) with the 
parameters (54) — (56), has the structure (26) with E/w(‘)v(x)} = 0. It is 
convenient to define 
O(t)d(t—r) = Elway" (z)} 
= E{w(t)w" (1)} — E{w)v" (7)R OS" (0) 
S()R' (OE) w" (7) 
HS()R'(HEW(f)v" (TRI OS" (0) 


“These, Gentlemen, are the opinions upon which I base my facts.” Winston Leonard Spencer-Churchill 
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=(Q()- SDR" ()S" (1) d(t= 7). (57) 


The corresponding Riccati differential equation is obtained by substituting A(t) 
for A(t) and Q(t) for Q(é) within (36), namely, 


P(t) = A(t)P(t)+ P(t)A’ (t)— PIC’ (OR ()C(t)P(t) + BOOB (t). (58) 
This can be rearranged to give 


P(t) = A(t)P(t) + PA’ (1)— K(Q(OR()K' (t) + BAO) B’ (t) , (59) 


in which the gain is now calculated as 


K() =(POC'H)+ BOSO)R"(0. (60) 


3.2.5 Including a Direct Feedthrough Matrix 


The approach of the previous section can be used to address signal models that 
possess a direct feedthrough matrix, namely, 


x(t) = A(t)x(t) + B(t)w(t) , (61) 
y(t) = C(t)x(t) + D(t)w(d) . (62) 


As before, the optimal state estimate is given by 
X(t |t) = A(NA(t|1) + K (D(z) -CMRE|0), (63) 

where the gain is obtained by substituting S(t) = Q(#)D"(V) into (60), 
K@) =(POC"()+ BOOOD' (0) RK", ~ 


in which P(?) is the solution of the Riccati differential equation 


“No human investigation can be called real science if it cannot be demonstrated mathematically.” 
Leonardo di ser Piero da Vinci 
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P(t) =(A()- BOOED" OR'WCO) PO 
+P(t)(AW)- BINOWD" (QR (HCW)" 
+B(t)(QW -QHDOR"(HD" (HOW) B' (t). 


Note that the above Riccati equation simplifies to 


P(t) = A()P(1) + PAT (N- KOR OK (H+ 00). (65) 


3.3 The Continuous-time Steady-State Minimum- 
Variance Filter 


3.3.1 Riccati Differential Equation Monotonicity 


This section sets out the simplifications for the case where the signal model is 
stationary (or time-invariant). In this situation the structure of the Kalman filter is 
unchanged but the gain is fixed and can be pre-calculated. Consider the linear 
time-invariant system 


x(t) = Ax(t)+ Bw(t), (66) 
y(t) = Cx(t), (67) 


together with the observations 


z(t) = y(t) + v0), (68) 


assuming that Re{A,(A)} < 0, E{w()} = 0, E{ww(a)} = O, E{v()} = 0, 
Efv()v"(2)} = R and E{w(t)v"(2)} = 0. It follows from the approach of Section 3 
that the Riccati differential equation for the corresponding Kalman filter is given 
by 


P(t) = AP(t)+ P(t)A’ — P(t)C’R'CP(t)+ BOB’ . (69) 


It will be shown that the solution for P(t) monotonically approaches a steady-state 
asymptote, in which case the filter gain can be calculated before running the filter. 
The following result is required to establish that the solutions of the above Riccati 
differential equation are monotonic. 


“Today's scientists have substituted mathematics for experiments, and they wander off through 
equation after equation, and eventually build a structure which has no relation to reality.” Nikola Tesla 
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Lemma 5 [11], [19], [20]: Suppose that X(t) is a solution of the Lyapunov 
differential equation 


X(t) = AX(t)+ X(t)A" (70) 


over an interval t € [0, T]. Then the existence of a solution X(to) = 0 implies X(t) 
> 0 for all t € [0, T]. 


Proof: Denote the transition matrix of x(t) = - A(®x(t) by ®' (t,t), for which 
@(t,7) = —A(f)@(t,7) and O'(t,7r) = -@'(t,r)A(t). Let P(t) = 
' (t,r)X (t)W(t,r) , then from (70) 


0=0"(t,r)(X()- AX(1)- X()A" ) O(,7) 
= ©" (t,r) X(t) W(t, 7) + 7 (t,7) X(t) @(t,r) + O' (t,7) X (Bt, 7) 
= P(t). 


Therefore, a solution X(to) = 0 of (70) implies that X(t) = 0 for allt € [0, T]. 


The monotonicity of Riccati differential equations has been studied by Bucy [6], 
Wonham [23], Poubelle et a/ [19] and Freiling [20]. The latter’s simple proof is 
employed below.'° 


Lemma 6 [19], [20]: Suppose for a t= 0 and a 6; > 0 there exist solutions P(t) = 0 
and P(t + 6;) = 0 of the Riccati differential equations 


P(t) = AP(t)+ P(t)A’ — P(t)C’ R"'CP(t)+ BOB" (71) 
and 
P(t+6,) = AP(t+6,)+ P(t+6,)A" — P(t+6,)C’R'CP(t+6,)+BQOB", (72) 


respectively, such that P(t) — P(t + 6;) = 0. Then the sequence of matrices P(t) is 
monotonic nonincreasing, that is, 


P(t) — P(t + 5) =O, for all t > 6. (73) 


“Mathematics is written for mathematicians.” Nicholaus Copernicus 


G. A. Einicke, Smoothing, Filtering and Prediction: Estimating 69 
the Past, Present and Future (24 ed.), Prime Publishing, 2019 


Proof: The conditions of the Lemma are the initial step of an induction argument. 
For the induction step, denote P(6,) = P(t) — P(t+6,), P(6) = P(t) - 
P(t+6,) and A = AP(t,)C'R'C-0.5P(6,). Then 


P(6,) = AP(6,)+ P(6,)A’ — P(t+6,)C’ R'CP(t +6,)+ P(t)C’ R'CP(t) 
= AP(6,)+ P(5,)A", 
which is of the form (70), and so the result (73) follows. 


A monotonic nondecreasing case can be established similarly — see [20]. 


3.3.2 Observability 


The continuous-time system (66) — (67) is termed completely observable if the 
initial states, x(¢o), can be uniquely determined from the inputs and outputs, w(?) 
and y(f), respectively, over an interval [0, 7]. A simple test for observability is is 
given by the following lemma. 


Lemma 7 [10], [21]. Suppose that A € R”” and C e€ R?’”. The system is 
observable if and only if the observability matrix O € R”*" is of rank n, where 


C 
CA 
O=| C# (74) 
CA”! 
Proof: Recall from Chapter 2 that the solution of (66) is 
x(t) =e"'x(t,) +] eM Bw(e)de.. (75) 


Since the input signal w(t) within (66) is known, it suffices to consider the 
unforced system x(t) = Ax(t) and y(t) = Cx(0), that is, Bw(t) = 0, which leads to 


y(t) = Ce“ x(t,) . (76) 


The exponential matrix is defined as 


“You can observe a lot by just watching.” Lawrence Peter (Yogi) Berra 
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242 NN 
Pte 
N! 
N-1 
=Ya(o4', (77) 
k=0 


where a,(t)= t* /k! . Substituting (77) into (76) gives 


y(t) = >, ()CA‘x(t,) 


= A, (t)Cx(t,) + @ (1)CAX(ty) +. + Ay_,()CA* x(t) . 


C 
CA 
=[4() @| ~- ay ,O]) C4? JxG). A78) 
Ape 
From the Cayley-Hamilton Theorem [22], 
C C 
CA CA 


rank CA’ = rank CA’ 


CAN CA"! 
for all N =n. Therefore, we can take N = n within (78). Thus, equation (78) 
uniquely determines x(to) if and only if O has full rank n. 
A system that does not satisfy the above criterion is said to be unobservable. An 
alternate proof for the above lemma is provided in [10]. If a signal model is not 


observable then a Kalman filter cannot estimate all the states from the 
measurements. 


1 0 
Example 3. The pair A = E il C = [1 0] is expected to be unobservable 


because one of the two states appears as a system output whereas the other is 


“Who will observe the observers ?” Arthur Stanley Eddington 
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Cc 1 0 
hidden. By inspection, the rank of the observability matrix, i = ; ,is 1. 


0 
Suppose instead that C = F i namely measurements of both states are 


1 0 
F . via |e 0 1]. ; 
available. Since the observability matrix CA = ra is of rank 2, the pair (A, 
0 1 


C) is observable, that is, the states can be uniquely reconstructed from the 
measurements. 


3.3.3 The Algebraic Riccati Equation 


Some pertinent facts concerning the Riccati differential equation (69) are: 
e Its solutions correspond to the covariance of the state estimation error. 


e From Lemma 6, if it is suitably initialised then its solutions will be 
monotonically nonincreasing. 


e If the pair (A, C) is observable then the states can be uniquely determined 
from the outputs, in the limit as ¢ approaches infinity, the Riccati 
differential equation will have a unique steady state solution. 


Lemma 8 [20], [23], [24]: Suppose that Re{ii(A)} < 0, the pair (A, C) is 
observable, then the solution of the Riccati differential equation (69) satisfies 


lim P(@)=P, (79) 


t>0 


where P is the solution of the algebraic Riccati equation 


0 = AP + PA’ — PC'’R'CP+BOQB'. (80) 


A proof that the solution P is in fact unique appears in [24]. A standard way for 
calculating solutions to (80) arises by finding an appropriate set of Schur vectors 


“Stand firm in your refusal to remain conscious during algebra. In real life, I assure you, there is no 
such thing as algebra.” Francis Ann Lebowitz 


72 Chapter 3 Continuous-Time, Minimum-Variance Filtering 


A  —-C'R'C 


for the Hamiltonian matrix H = 
BOB" —A’ 


| see [25] and the 


Hamiltonian solver within Matlab™. 


Example 4. Suppose that A =—1 and B = C= Q= R = 1, for which the solution of 
the algebraic Riccati equation (80) is P = 0.4121. Using Euler’s integration 
method (see Chapter 1) with 6, = 0.01 and P(0) = 1, the calculated solutions of the 
Riccati differential equation (69) are listed in Table 1. The data in the table 
demonstrate that the Riccati differential equation solution converges to the 


algebraic Riccati equation solution and lim P(t) =0. 


t P(t) P(t) 

1 0.9800 —2.00 

10 0.8316 -141 

100 0.4419 —8.13*107 
1000 0.4121 —4.86* 108 


Table 1. Solutions of (69) for Example 4. 


The so-called infinite-horizon (or stationary) Kalman filter is obtained by 
substituting time-invariant state-space parameters into (34) - (35) to give 


R(t |t) =(A-KC)&(t| 1) + Kz(0), (81) 
HEN = Ce] d), (82) 

where 
K =PC'R", (83) 


in which P is calculated by solving the algebraic Riccati equation (80). The output 
estimation filter (81) — (82) has the transfer function 


H,,(s) =C(sI- A+KC)'K. (84) 


"Arithmetic is being able to count up to twenty without taking off your shoes." Mickey Mouse 
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Example 5. Suppose that a signal y(#) € R is generated by the system 


m m-1 
b,, - +b, aa +...45, +b, 
yO = 7 aa w(t) . 
Bo hie Oped Sate 
dt" dt" dt 


This system’s transfer function is 


bs" +b, 8" +..+bs tb, 


G(s) = n n-1 
a,S' +4, 8° +...+ 45+ ay 


which can be realised in the controllable canonical form [10] 


—d,, —G,, . a —d 
1 0 48 0 0 

A=|\ 0 1 (BSS and C=] booby ag lb. <By)): 
: 0 0 0 
0 0 1 0 0 


m m-1 KEE 
ie any ge ae ee Hel8 
at” dat” at 


qd’ q”! 


d 
+..+a,—+4, 


a, dt’ +a, dt’ dt 


Fig. 4. The optimal filter for Example 5. 


The optimal filter for estimating (7) from noisy measurements (29) is obtained by 
using the above state-space parameters within (81) — (83). It has the structure 
depicted in Figs. 3 and 4. These figures illustrate two features of interest. First, the 
filter’s model matches that within the signal generating process. Second, designing 
the filter is tantamount to finding an optimal gain. 


“If you think dogs can’t count, try putting three dog biscuits in your pocket and then giving Fido two of 
them.” Phil Pastoret 
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3.3.4 Equivalence of the Wiener and Kalman Filters 


When the model parameters and noise statistics are time-invariant, the Kalman 
filter reverts to the Wiener filter. The equivalence of the Wiener and Kalman 
filters implies that spectral factorisation is the same as solving a Riccati equation. 
This observation is known as the Kalman-Yakubovich-Popov Lemma (or Positive 
Real Lemma) [15], [26], which assumes familiarity with the following Schur 
complement formula. 


For any matrices D,,, ®,, and ® 


wie where ®,, and ®,, are symmetric, the 


22? 
following are equivalent. 


(i) be a >0. 
®,, ®,, 


(ii) ®,, 20, ©,, > OF 0),,. 


1 22. 
(ii) B20, , = ©,,0307,. 
The Kalman-Yakubovich-Popov Lemma is set out below. Further details appear in 
[15] and a historical perspective is provided in [26]. A proof of this Lemma makes 
use of the identity 


PA’ — AP = P(-sI — A")+(sI—A)P. (85) 


Lemma 9 [15], [26]: Consider the spectral density matrix 


*s Q O}|(-sf-A')'CT 
AA" (s)=|C(sI- A)! I (86) 
(s) =| C(sI A) | ao : 

Then the following statements are equivalent: 
(i) AA" (ja) =0 for allo € (—~%,%). 

BOB'+AP+PA’ PCT 
(ii) acd >0 

CP R 

(iii) There exists a nonnegative solution P of the algebraic Riccati equation 
(80). 


Proof: To establish equivalence between (i) and (iii), use (85) within (80) to 
obtain 


P(-sI — A’) +(sI —.A)P = BOB" — PC’ RCP. (87) 


“Mathematics is the queen of sciences and arithmetic is the queen of mathematics.” Carl Friedrich 
Gauss 
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Premultiplying and postmultiplying (87) by C(sI-A)' and (-sI-A’Y'C', 


respectively, results in 


C(sI — A)! PC’ +CP(-sI - A’)'C" 
= C(sI — A)'(BOB’ — PC’RCP)(-sI — A7)'C" . (88) 


Hence, 


AA(s) = GOG(s)+R 
= C(sI — A) BOB’ (-sI- A") 'CT +R 
+C(sI — A)"' PC’ RCP(-sI — A")'C" 
= C(sI — A)! PC’ RCP(-sI — A")'C" 
+C(sI — A) PC’ +CP(-sI - AT)'C7 +R 
=(C(sI- A)" KR"? + RY? \(RY°K" (-sI -A)"C" +R”) 
>0. (89) 


The Schur complement formula can be used to verify the equivalence of (ii) and 
(iii). 

In Chapter 1, it is shown that the transfer function matrix of the optimal Wiener 
solution for output estimation is given by 


H,,(s)=1-R'?A"(s), (90) 


where s = j@ and 
AA" (s)=GOG"(s)+R. (91) 
is the spectral density matrix of the measurements. It follows from (91) that 
A(s) = C(sI — A) KR'? +R". (92) 


The Wiener filter (90) requires the spectral factor inverse, A“'(s), which can be 
found from (92) and using [J + C(s!— Ay!K]-! =I+ C(sI— A + KC)'K to obtain 


A\(s)= R71? —R?C(sI-A+KC)'K . (93) 


Substituting (93) into (90) yields 


"Mathematics consists in proving the most obvious thing in the least obvious way." George Polya 
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Ho, (s) =C(sI- A+ KC)'K, (94) 
which is identical to the minimum-variance output estimator (84). 
Example 6. Consider a scalar output estimation problem where G(s) = (s + 1)!, 0 
= 1, R=0.0001 and the Wiener filter transfer function is 
H(s) =99(s +100)". (95) 


Applying the bilinear transform yields A = —1, B = C= 1, for which the solution of 
(80) is P= 0.0099. By substituting K = PC’R™! = 99 into (90), one obtains (95). 


3.4 Chapter Summary 


The Kalman-Bucy filter which produces state estimates x(t|f) and output 
estimates p(t|t) from the measurements z(t) = y(t) + v(t) at time ¢ is summarised 
in Table 2. This filter minimises the variances of the state estimation error 
Ef(x(t) — X(t\O)\(x() — &(t|0)'} = P@ and the output estimation error 


EX(y(t) — HEI DVO — HED)" = COPOCO. 


ASSUMPTIONS MAIN RESULTS 
. a ; One Ot ' x(t) = A(t)x(t) + B(t)w(t) 
.E{w()w(f)} = Ot 
Z and E{v(t)v"(t)} = y(t) = C(O) x(t) 
2 3 R(t) are known. A(t), z(t) = y(t) + v(t) 
he B(t) and C(#) are 
ae known. 


X(t | 1) = A(DR(t|1) + K OEM -COR(E| 0) 
P(E|) = COX(C|1) 


Filtered state 
and output 


O(t) > 0 and R(t) > 0. K(t) = P()C()R(t) 
P(t) = A(t)P(t) + P(t)A’ (t) 


Filter gain 
and Riccati 
differential 


—P()C (AR (QCO)P(t) + BOOB" (0) 


Table 2. Main results for time-varying output estimation. 


“There are two ways to do great mathematics. The first is to be smarter than everybody else. The 
second way is to be stupider than everybody else - but persistent." Raoul Bott 
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When the model parameters and noise covariances are time-invariant, the gain is 
also time-invariant and can be precalculated. The time-invariant filtering results 
are summarised in Table 3. In this stationary case, spectral factorisation is 
equivalent to solving a Riccati equation and the transfer function of the output 


estimation filter, Hors) = C(s!-—A+KC)'K, is identical to that of the Wiener 


filter. It is not surprising that the Wiener and Kalman filters are equivalent since 
they are both derived by completing the square of the error covariance. 


ASSUMPTIONS MAIN RESULTS 
Etw(t)} = Etv(p)} = 0. X(t) = Ax(t)+ Bw(t) 
E{w(t)w"(0)} = QO and 

z Efv(pv")} = Rare YO) = AO 

a¢ known. A, B and C are z(t) = y(t) + v(t) 

& a known. The pair (A, C) 


is observable. 


R(t | t) = AR(t |) + K (z(t) -— C&(t | 1)) 
Ht |f) = Cee | 1) 


Filtered state 
and output 


O>OandR>0. K=PCR! 
0 = AP+ PA’ —PC’R'CP + BOB" 


algebraic Riccati 


Filter gain and 
equation 


Table 3. Main results for time-invariant output estimation. 


3.5 Problems 


Problem 1. Show that x(t) = A(¢)x(t) has the solution x(t) = ®(¢,0)x(0) where 
@(t,0) = A(t)@(t,0) and (t,t) = L Hint: use the approach of [13] and 
integrate both sides of x(t) = A(f)x(t) . 


Problem 2. Given that: 


“The scientific man does not aim at an immediate result. He does not expect that his advanced ideas 
will be readily taken up. His work is like that of the planter - for the future. His duty is to lay the 
foundation for those who are to come, and point the way.” Nikola Tesla 
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(1) 


(ii) 
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the Lyapunov differential equation for the system x(t) = F(f)x(f) + 


G(Dw(d) is “ELx(x"(0} = ADE{x()x"(D} + Efx()x" (OF) + 
GNOME" (4 ; 

the Kalman filter for the system x(t) = A(t)x(t) + B)w(d, z(t) = COx(H) 
+ y(t) has the structure X(t |t)= A(Dx(t|H+KO2ZO-COxE| dO); 


write a Riccati differential equation for the evolution of the state error covariance 
and determine the optimal gain matrix K(d). 


Problem 3. Derive the Riccati differential equation for the model x(t) = A(t)x(t) 
+ BAw(A), 2t) = C(Ax(t) + W(t) with Efw(t)w"(t)} = OMS(t — 7), Efv(Dv(a} = 
R(5(t — 7) and E{w(fv"(r)} = S(HS(t — 2). Hint: consider x(t) = A(t)x(t) + 
B()w(t) + BOS(R"(H(EO — C)x() — v(0)). 

Problem 4. For output estimation problems with B = C = R = 1, calculate the 
algebraic Riccati equation solution, filter gain and transfer function for the 


following. 

(a) A=-landQ=8. (b) A=-2 and Q= 12. 
(c) A=-3 and O= 16. (d) A =-4 and QO = 20. 

(e) A=—5 and O= 24. (f) A=-6 and O= 28. 

(g) A=-7 and Q= 32. (h) A=~-8 and QO = 36. 

(i) A=~-9 and QO = 40. ) A=-10 and O= 44. 


Problem 5. Prove the Kalman-Yakubovich-Popov Lemma for the case of 


al 


AA" (s) =[C(sI- Ay" i] 


S’ R 
O ea “| 


“Oo vo}-[8 p08) show 


S R I 


Problem 6. Derive a state space formulation for minimum-mean-square-error 
equaliser using 


AT(s)=R? —R"'?C(sI-A+KC)'K. 


3.6 


Glossary 


In addition to the terms listed in Chapter 1, the following have been used herein. 


GR’ > R! A linear system that operates on a p-element input signal and 


produces a g-element output signal. 


A(t), BO, Time-varying state space matrices of appropriate dimension. 


“Mathematics is a game played according to certain simple rules with meaningless marks on paper.” 
David Hilbert 
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C(t), Di) 

O(f) and R(t) 
O(t,0) 

Gg" 

EX}, E{x(} 
Etx(t) | yOs 
X(t | 0) 

X(t |t) 


K() 
PO) 


A, B, C,D 


Q and R 


G(s) 
As) 
Hok(s) 


The system & is assumed to have the realisation x(t) = 

A(Ox(D) + BOW, WO) = C(Ox(t) + DOWD. 

Covariance matrices of the nonstationary stochastic signals 

w(d)and v(t), respectively. 

State transition matrix which satisfies (7,0) = a = 
t 

A(t)@(t,0) with the boundary condition ®(¢,t) = J. 

Adjoint of & . The adjoint of a system having the state- 

space parameters {A(‘), BA, C(d), D(D} is a system 

parameterised by {— 47(4), — C"(1), B1(A), D1}. 

Expectation operator, expected value of x(t). 

Conditional expectation, namely the estimate of x(f) given 

y(0). 

Conditional mean estimate of the state x(4) given data at time 

t. 

State estimation error which is defined by x(t|t) = x() — 


X(t|t). 

Time-varying filter gain matrix. 

Time-varying error covariance, i.e., E{X(t)X" (t)} , which is 
the solution of a Riccati differential equation. 

Time-invariant state space matrices of appropriate 
dimension. 

Time-invariant covariance matrices of the stationary 
stochastic signals w(f) and v(t), respectively. 

Observability matrix. 

Signal to noise ratio. 

Time-invariant filter gain matrix. 

Time-invariant error covariance which is the solution of an 
algebraic Riccati equation. 

Hamiltonian matrix. 

Transfer function matrix of the signal model. 

Transfer function matrix of the minimum-variance solution. 
Transfer function matrix of the minimum-variance solution 
specialised for output estimation. 
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4. Discrete-Time, Minimum-Variance 
Filtering 


4.1 Introduction 


Kalman filters are employed wherever it is desired to recover data from the noise 
in an optimal way, such as in satellite orbit estimation, aircraft guidance, radar, 
communication systems, navigation, medical diagnosis and finance. Continuous- 
time problems that possess differential equations may be easier to describe in a 
state-space framework, however, the filters have higher implementation costs 
because an additional integration step and higher sampling rates are required. 
Conversely, although discrete-time state-space models may be less intuitive, the 
ensuing filter difference equations can be realised immediately. 


The discrete-time Kalman filter calculates predicted states via the linear recursion 
Kpatie = AXpare + Ki (Ze -—G Xpand) > 


where the predictor gain, Kx, is a function of the noise statistics and the model 
parameters. The above formula was reported by Rudolf E. Kalman in the 1960s 
[1], [2]. He has since received many awards and prizes, including the National 
Medal of Science, which was presented to him by President Barack Obama in 
2009. 


The Kalman filter calculations are simple and well-established. A possibly 
troublesome obstacle is expressing problems at hand within a state-space 
framework. This chapter derives the main discrete-time results to provide 
familiarity with state-space techniques and filter application. The continuous-time 
and discrete-time minimum-square-error Wiener filters were derived using a 
completing-the-square approach in Chapters | and 2, respectively. Similarly for 
time-varying continuous-time signal models, the derivation of the minimum- 
variance Kalman filter, presented in Chapter 3, relied on a least-mean-square (or 
conditional-mean) formula. This formula is used again in the solution of the 
discrete-time prediction and filtering problems. Predictions can be used when the 
measurements are irregularly spaced or missing at the cost of increased mean- 
square-error. 


“Man will occasionally stumble over the truth, but most of the time he will pick himself up and 
continue on.” Winston Leonard Spencer-Churchill 
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This chapter develops the prediction and filtering results for the case where the 
problem is nonstationary or time-varying. It is routinely assumed that the process 
and measurement noises are zero mean and uncorrelated. Nonzero mean cases can 
be accommodated by including deterministic inputs within the state prediction and 
filter output updates. Correlated noises can be handled by adding a term within the 
predictor gain and the underlying Riccati equation. The same approach is 
employed when the signal model possesses a direct-feedthrough term. A 
simplification of the generalised regulator problem from control theory is 
presented, from which the solutions of output estimation, input estimation (or 
equalisation), state estimation and mixed filtering problems follow immediately. 


Fig. 1. The discrete-time system G operates on the input signal w; and produces the output yx. 


4.2. The Time-varying Signal Model 


A discrete-time time-varying system G :R”"” — R’*” is assumed to have the 
state-space representation 


Xp = AX, + BW,» (1) 
y, =C,x,+ Dw, (2) 
where 4; € R””, Be ¢ R"”, Cy € R?™” and Dy €¢ R’”™” over a finite interval k 
e [1, N]. The wx is a stochastic white process with 
E{w,} = 0, E{wiw,} = 0,6 5 > (3) 
1 if j=k 
0 if j#k 


depicted in Fig. 1, in which z~ is the unit delay operator. It is interesting to note 
that, at time & the current state 


in which 6, -| is the Kronecker delta function. This system is 


“Rudy Kalman applied the state-space model to the filtering problem, basically the same problem 
discussed by Wiener. The results were astonishing. The solution was recursive, and the fact that the 
estimates could use only the past of the observations posed no difficulties.” Jan. C. Willems 
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x, = Ap Xp + BW» (4) 
does not involve wx. That is, unlike continuous-time systems, here there is a one- 


step delay between the input and output sequences. The simpler case of D; = 0, 
namely, 


Ve =OyX (5) 


is again considered prior to the inclusion of a nonzero Dj. 


4.3. The State Prediction Problem 
Suppose that noisy observations of (5) are available, that is, 

Zk =VkT Vk (6) 
where v; is a white measurement noise process with 


E {vi} =0, Efvve} = Rid and E{w,v, }=0. (7) 


Fig. 2. The state prediction problem. The objective is to design a predictor Af which operates on the 
measurements and produces state estimates such that the variance of the error residual eg. is 
minimised. 


It is noted above for the state recursion (4), there is a one-step delay between the 
current state and the input process. Similarly, it is expected that there will be one- 
step delay between the current state estimate and the input measurement. 
Consequently, it is customary to denote X,,,_, as the state estimate at time ‘, given 
measurements at time k— 1. The x,,,_, is also known as the one-step-ahead state 
prediction. The objective here is to design a predictor A that operates on the 
measurements z, and produces an estimate, },,,, = C,X;,,_,, of ye = Cis, so that 


“Prediction is very difficult, especially if it’s about the future.” Niels Henrik David Bohr 
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the covariance, E{j,,, ,3,,,}, of the error residual, j,,., =e — Dyy> iS 
minimised. This problem is depicted in Fig. 2 


4.4 The Discrete-time Conditional Mean Estimate 


The predictor derivation that follows relies on the discrete-time version of the 
conditional-mean or least-mean-square estimate derived in Chapter 3, which is set 


. a 7 é 
out as follows. Consider a stochastic vector [aj /] having means and 
covariances 


3 (8) 
B 


Geiss} 
Bi » pay 2 5p, 


respectively, where 2, ,, =A An estimate of a, given f,, denoted by 


E{a, | B,}, which minimises E {(a, — E{a, | B,})\(a, — E{a, |B.) } , is given 
by 


and 


EXa, | Bp} =F+X, Xp 9, (Be -B)- (10) 


The above formula is developed in [3] and established for Gaussian distributions 
in [4]. A derivation is requested in the problems. If a; and f are scalars then (10) 
degenerates to the linear regression formula as is demonstrated below. 

Example 1 (Linear regression [5]). The least-squares estimate @, = af, + b of 
a, given data ox, Bx € R over [1, N], can be found by minimising the 


ee 1L ‘ 1L ; 
performance objective J = 7 (a, - a) = a (a, — af, — b)’. Setting 
k=l k=l 


i" = 0 yields b = &-afP. Setting = = 0, substituting for b and using the 
la 


definitions (8) — (9), results in a = rs 


BP * 


“T admired Bohr very much. We had long talks together, long talks in which Bohr did practically all the 
talking.” Paul Adrien Maurice Dirac 
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45 Minimum-Variance Prediction 


It follows from (1), (6), together with the assumptions E{w;} = 0, E{v;} = 0, that 
Ef{xin} = EfAxe} and E{z,} = E{Cixe}. It is assumed that similar results hold in 
the case of predicted state estimates, that is, 


ef }-[ee"'] (11) 
Zz CX 


Substituting (11) into (10) and denoting x,,,,, = E{X,,,|Z,} yields the predicted 
state 


Kess = A Maina + Ky (& — GX) » (12) 


where Ky = E{%,,,z,}E{z,z,}' is known as the predictor gain, which is 
designed in the next section. Thus, the optimal one-step-ahead predictor follows 
immediately from the least-mean-square (or conditional mean) formula. A more 
detailed derivation appears in [4]. The structure of the optimal predictor is shown 
in Fig. 3. It can be seen from the figure that # produces estimates j,,,, = 


C,X,,,_, from the measurements Zz. 


Let X,,,., =Xx—X,,,_, denote the state prediction error. It is shown below that the 


expectation of the prediction error is zero, that is, the predicted state estimate is 
unbiased. 


Wk /k-1 = CkX ke k-1 


Fig. 3. The optimal one-step-ahead predictor which produces estimates 2, sip OF Xe+r given 


measurements Z,. 


Lemma 1: Suppose that Xj) = Xo, then 


“When it comes to the future, there are three kinds of people: those who let it happen, those who make 
it happen, and those who wondered what happened.” John M. Richardson Jr. 
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ERX pays =O (13) 
for allk e€ [0, NJ. 


Proof: The condition Xj. = xo is equivalent to X,,, = 0, which is the initialisation 
step for an induction argument. Subtracting (12) from (1) gives 


Xian = (A, — KC, iy + Bw, — Kv, (14) 
and therefore 
EXXpanh = (A - KC JES at + Btw} -K, LY, } . (15) 


From assumptions (3) and (7), the last two terms of the right-hand-side of (15) are 
zero. Thus, (13) follows by induction. 


4.6 Design of the Predictor Gain 


It is shown below that the optimum predictor gain is that which minimises the 
prediction error covariance E{%,,,_X,,:}- 


Lemma 2: In respect of the estimation problem defined by (1), (3), (5) - (7), 


suppose there exist solutions P.,_, = P;,, = 0 to the Riccati difference equation 
Prat = A,PipiAy +B,O,B, =A aCe (CPG, +R) CG Basae > (16) 


over [0, N], then the predictor gain 
AAP CONG. CFR (17) 
within (12) minimises P,,_, = E{%p4)Xc4-1} 
Proof: Constructing P..4,, = Ef %c4,X,4), using (3), (7), 14), Ek%,,.wo} = 0 
and E{%.,, Vv, } = 0 yields 
Phaiy = (Ay ~K,C,) Pep (4, ~K,C,)' + B,O, BL + K,R, Ki, (18) 


which can be rearranged to give 


“To be creative you have to contribute something different from what you've done before. Your results 
need not be original to the world; few results truly meet that criterion. In fact, most results are built on 
the work of others.” Lynne C. Levesque 
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Pgp= APA, = SCG haG FR) Chima F208, 
+(K, SAC, (CE ace oR MG er +R,) is 
x(K, SAP eC (CPisCe +R, ye ee wy) 


By inspection of (19), the predictor gain (17) minimises P.,,,,. 


4.7 Minimum-Variance Filtering 


It can be seen from (12) that the predicted state estimate X,,, , is calculated using 


the previous measurement z;.1 as opposed to the current data z;. A state estimate, 
given the data at time k, which is known as the filtered state, can similarly be 
obtained using the linear least squares or conditional-mean formula. In Lemma | it 
was shown that the predicted state estimate is unbiased. Therefore, it is assumed 
that the expected value of the filtered state equals the expected value of the 


predicted state, namely, 
ef] ] = Kit (20) 
2 Cp /4-1 


Substituting (20) into (10) and denoting x,, = E{x,|z,} yields the filtered 
estimate 


Kip = Xia th (A - Oxia)» (21) 


where Ly = E{%,z/}E{z,z,}' is known as the filter gain, which is designed 
subsequently. Let x,,, =xx— X,,, denote the filtered state error. It is shown below 


that the expectation of the filtered error is zero, that is, the filtered state estimate is 
unbiased. 


Lemma 3: Suppose that Xo) = Xo, then 


EXx,,} =0 (22) 
for allk e€ [0, NJ. 


Proof: Following the approach of [6], combining (4) - (6) results in ze = CyAk-1Xk-1 
+ CyBi-rwi-1 + ve, which together with (21) yields 


Xu =U -L,C, AX ina tT - £0, BM — Ly, - (23) 


From (23) and the assumptions (3), (7), it follows that 


“A professor is one who can speak on any subject - for precisely fifty minutes.” Norbert Wiener 
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EX }=U- L, C, AE ana} 
=U - 2,0, ) Ap U - LC AE {Xoo} - (24) 


Hence, with the initial condition X),, = x0, E{X,,,} = 0. 


4.8 Design of the Filter Gain 


It is shown below that the optimum filter gain is that which minimises the 
covariance E{%,,,%;,,}, where %,,, =xx— %,,, is the filter error. 


Lemma 4: In respect of the estimation problem defined by (1), (3), (5) - (7), 
suppose there exists a solution P.,, = P, = 0 to the Riccati difference equation 


Pan = Pe 2 Pps (GP +R)" C, Pras > (25) 
over [0, NJ, then the filter gain 
Ly= Beihai +R)" > (26) 


within (21) minimises P,, = E{%,,X,,}- 


Proof: Subtracting X,,, from xx yields X,, =Xk— Xp =Xk~ Xpyy— L, (Cx, + 


Vi - CX, 4-,), that is, 


Key =A LC) ® cig — LM (27) 
and 
Pu = (7-L,¢, Pina (J -L,C, ,- ae LRT 9 (28) 
which can be rearranged as 


Pan = Pe ge, (CPG, Re CoP es 
+(L, BE iGe (GPgat, +R, Ny CO are ard +R,) 


x(L, rion CCG. ROY (29) 


“Before the advent of the Kalman filter, most mathematical work was based on Norbert Wiener's ideas, 
but the 'Wiener filtering’ had proved difficult to apply. Kalman's approach, based on the use of state 
space techniques and a recursive least-squares algorithm, opened up many new theoretical and 
practical possibilities. The impact of Kalman filtering on all areas of applied mathematics, engineering, 
and sciences has been tremendous.” Eduardo Daniel Sontag 
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By inspection of (29), the filter gain (26) minimises F,,,. 


Example 2 (Data Fusion). Consider a filtering problem in which there are two 
measurements of the same state variable (possibly from different sensors), namely 


Ri, 0 ; 
: , with Rix, Rox € R. Let Pres 
0 Ry 


denote the solution of the Riccati difference equation (25). By applying Cramer’s 
rule within (26) it can be found that the filter gain is given by 


_ Ri Fu Ri Foe 
k — Pa §$—=| a, Se a So aa ee , 
Ry Fa + RP + R Roy Ry Pia + Ry Pc a Ri Roy 


1 
Ak, Br On E R, Ck = Hl and R= 


from which it follows that — lim ot =[1 0] and ‘ lim ot =[0 1]. That 


RypO Rox 279 Rit 


is, when the first measurement is noise free, the filter ignores the second 
measurement and vice versa. Thus, the Kalman filter weights the data according to 
the prevailing measurement qualities. 


4.9 The Predictor-Corrector Form 


The Kalman filter may be written in the following predictor-corrector form. The 
corrected (or filtered) error covariances and states are respectively given by 


Po = Pe = PG, (Oph a ce PR Cas 
= Fri -h, (CPeiGe +R) Ly 


= (I-L,C, Pritt (30) 
Sai = Spa thy (& — GX) 
= (1-20, Xin t YZ » es) 


where Li = P.,, ,Ci(C,P..,C, + Re!. Equation (31) is also known as the 
measurement update. The predicted state and error covariances are respectively 
given by 


Kise = AXuiy = (Ay — Ke Cy) Xin + Ki (32) 
Fak = AP pA, + B,O,B, (33) 


where Ky = A,P.,,C,(C,P.,,C, + Re!. It can be seen from (31) that the 


corrected estimate, X,,,, is obtained using measurements up to time k. This 


“T have been aware from the outset that the deep analysis of something which is now called Kalman 
filtering was of major importance. But even with this immodesty I did not quite anticipate all the 
reactions to this work.” Rudolf Emil Kalman 
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contrasts with the prediction at time k + | in (32), which is based on all previous 
measurements. The output estimate is given by 


Vern = Cee 
= CX 41 + OL, (% -— OSes) 
=C,(U-L,C, Sin t+ Cpl Ze - (34) 


4.10 TheA Posteriori Filter 


The above predictor-corrector form is used in the construction of extended 
Kalman filters for nonlinear estimation problems (see Chapter 10). When state 
predictions are not explicitly required, the following one-line recursion for the 
filtered state can be employed. Substituting x,,., = A,,%,4,,., into %,, 
(1 -L,C,)X,,,+ Lize yields x, = UT - L,C,)A, X44, + Lize. Hence, the 
output estimator may be written as 


a _ —— Ar L, lee (35) 
Veit C, 25 


This form is called the a posteriori filter within [7], [8] and [9]. The absence of a 
direct feed-through matrix above reduces the complexity of the robust filter 
designs described in [7], [8] and [9]. 


4.11 The Information Form 


Algebraically equivalent recursions of the Kalman filter can be obtained by 
propagating a so-called corrected information state 


Xai = Prides > (36) 
and a predicted information state 
Xestik =F ie ials : (37) 
The expression 
(4+ BCD)' = 4'-A'B(C'+DA'B)'DA", (38) 


which is variously known as the Matrix Inversion Lemma, the Sherman-Morrison 
formula and Woodbury’s identity, is used to derive the information filter, see [3], 


“T have travelled the length and breadth of this country and talked with the best people, and I can 
assure you that data processing is a fad that won’t last out the year.” Editor in charge of business books 
for Prentice Hall, 1957. 
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[4], [11], [14] and [15]. To confirm the above identity, premultiply both sides of 
(38) by (4+ BD"'C) to obtain 


I =1+ BCDA™ — B(C! +DA'B)' DA" — BCDA'B(C + DA'B)' DA" 
= 1+BCDA™ - BU +CDA"B) "(C+ DA'B)' DA" 
= 1 +BCDA™ — BC(C7 + DA'B) (C7 + DA"B)'DA", 


from which the result follows. From the above Matrix Inversion Lemma and (30) 
it follows that 


oe Zz (Poa Pye (Cpe idee +R, CRs): 
Pip POR: Cs (39) 


assuming that P,|, and R,' exist. An expression for P_',, can be obtained from 
the Matrix Inversion Lemma and (33), namely, 


aan = CAPA. +B,O,Br)" 
=(Fy'+BO,Bi)", (70) 
where Fy = (A4,P,,,4,) | = A,’ P;,4,', which gives 
Pip FB BBO, Be ies (41) 


Another useful identity is 


(A+ BCD)'BC = A'(I1+ BCDA')' BC 
= A'B(I+CDA'B)'C 


= A'B(C'+DA"'B)". ) 
From (42) and (39), the filter gain can be expressed as 
L,= Pra, (CP C. eR) 
= (Pia + CRE C, a CRE 
= Pi Ce Rye : (43) 
Premultiplying (39) by P,,, and rearranging gives 
I-L,.C, = Pu Pbia : (44) 


It follows from (31), (36) and (44) that the corrected information state is given by 


“The fog of information can drive out knowledge.” Daniel Joseph Boorstin 
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A _ p-la 
Xen = Pe Xern 

=) pi ns -1 

=F U- LC, dna t+ Pane 

Sa T p-l 

=X yitCG, Ry z,- (45) 


The predicted information state follows from (37), (41) and the definition of F;, 
namely, 


Sea = Poeik 
= Pon AXe 
=(-F,B, (Bi FB, a QO.) By MAX 
=(-F,B,(B, FB, +O.) By AL Su - (46) 


Recall from Lemma | and Lemma 3 that E{x, —x,,,,,} =O and E{x, -x,,} =0, 
provided %,, = xo. Similarly, with %,, = Pox, it follows that 
Ex, — Pop Xan} = 0 and E{x, -P.,,x,,,} = 0. That is, the information states 


(scaled by the appropriate covariances) will be unbiased, provided that the filter is 
suitably initialised. The calculation cost and potential for numerical instability can 
influence decisions on whether to implement the predictor-corrector form (30) - 
(33) or the information form (39) - (46) of the Kalman filter. The filters have 
similar complexity, both require a p x p matrix inverse in the measurement 
updates (31) and (45). However, inverting the measurement covariance matrix for 
the information filter may be troublesome when the measurement noise is 
negligible. 


4.12 Comparison with Recursive Least Squares 
The recursive least squares (RLS) algorithm is equivalent to the Kalman filter 
designed with the simplifications A; = 7 and B; = 0; see the derivations within [10], 
[11]. For convenience, consider a more general RLS algorithm that retains the 
correct A; but relies on the simplifying assumption B; = 0. Under these conditions, 
denote the RLS algorithm’s predictor gain by 

K, = AP nae, (CP aC, +R, Ne > (47) 


where P.,,_, is obtained from the Riccati difference equation 


Rape A iA Sab ae (ha th: CPA (48) 


“Information is the oxygen of the modern age. It seeps through the walls topped by barbed wire, it 
wafts across the electrified borders.” Ronald Wilson Reagan 
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It is argued below that the cost of the above model simplification is an increase in 
mean-square-error. 


Lemma 5: Let P,_,,, denote the predicted error covariance within (33) for the 


optimal filter. Under the above conditions, the predicted error covariance, P.,,_,, 
of the RLS algorithm satisfies 


Popa S Pas (49) 


Proof: From the approach of Lemma 2, the RLS algorithm’s predicted error 
covariance is given by 


Pipa APA, AYP CNG PG, +R) CP A, 4 B08, 
+(K, ee ee ert (08 Hamm ors +R, Y MCB eC +R,) 
x(K, = AP yiG, (Pie. +R, youre (50) 


The last term on the right-hand-side of (50) is nonzero since the above RLS 
algorithm relies on the erroneous assumption B,O,B/ = 0. Therefore (49) 
follows. 


4.13 Repeated Predictions 


When there are gaps in the data record, or the data is irregularly spaced, state 
predictions can be calculated an arbitrary number of steps ahead. The one-step- 
ahead prediction is given by (32). The two, three and j-step-ahead predictions, 
given data at time &, are calculated as 


Kes = ApXeae (51) 
Ke asin = ApsoXerre (52) 
Kes jk = As ieee ’ (53) 
see also [4], [12]. The corresponding predicted error covariances are given by 
Poon = Ut Rare: ae +B Oa (54) 
Fai = Aig Pe si Aes + Bi sQpnBe ed (55) 
56 
Pes jik =A B, Ai + By 5 1p j1Be ( ) 


k+j-l° k+j-i/k*"k+j-1 k+j-1° 


“All of the books in the world contain no more information than is broadcast as video in a single large 
American city in a single year. Not all bits have equal value.” Carl Edward Sagan 
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Another way to handle missing measurements at time i is to set C; = 0, which 
leads to the same predicted states and error covariances. However, the cost of 
relying on repeated predictions is an increased mean-square-error which is 
demonstrated below. 


Lemma 6: 
(i) Pe = Po % 


(ii) Suppose that 


A, A, +B,O,B, >I (57) 
for all k € [0, N], then P,, i, 2 Fiji, forall G+k) €[0, N]. 
Proof: 
(i) The claim follows by inspection of (30) _ since 


Ly (Cp Payp oC t+ RL. 20. Thus, the filter outperforms the 
one-step-ahead predictor. 

(ii) For F..j4, 2 9 condition (57) yields Avi gph s + 
Bop JO oy Be wh > P.,; 4, which together with (56) results in 


P. P. 


ke+ j/k 2 k+j-Wk* 


Example 3. Consider a filtering problem where A = 0.9 and B= C= Q=R = 1, for 
which 44’ + BOB’ = 1.81 > 1. The predicted error covariances, P.,,,,,j = 1 ... 


10, are plotted in Fig. 4. The monotonically increasing sequence of error variances 
shown in the figure demonstrates that degraded performance occurs during 
repeated predictions. Fig. 5 shows some sample trajectories of the model output 
(dotted line), filter output (crosses) and predictions (circles) assuming that z3 ... zs 
are unavailable. It can be seen from the figure that the prediction error increases 
with time k, which illustrates Lemma 6. 


“Where a calculator on the ENIAC is equipped with 18,000 vacuum tubes and weighs 30 tones, 
computers in the future may have only 1,000 vacuum tubes and perhaps weigh 1.5 tons.” Popular 
Mechanics, 1949 
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Fig. 4. Predicted error variances for Example 3. Fig. 5. Sample trajectories for Example 3: y; 


(dotted line), Jy), (crosses) and J, ;), 


(circles). 


4.14 Accommodating Deterministic Inputs 
Suppose that the signal model is described by 


Xp = AX, + BW, + My » 


YH GQxX+M; 


where “x and zy, are deterministic inputs (such as known non-zero means). 
modifications to the Kalman recursions can be found by assuming *,,,,, = A, 


+uxand y,,,, = C,X,),_, + a. The filtered and predicted states are then given 


Ki = Xia th (& - Gina —%) 


and 


Kean = Akan t+ My 
= AX + Ky (2, -— GX M+» 
respectively. Subtracting (62) from (58) gives 
Kp siik = AXpipa — Ki (GX tT +My — TM) + BW, + My — My 
= (A, — KC, ) Xp +B, - Ky » 


where X,,,_; =Xk— X,,,_,. Therefore, the predicted error covariance, 


“T think there is a world market for maybe five computers.” Thomas John Watson 
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(58) 
(59) 


The 
Xe 


by 


(60) 


(61) 
(62) 


(63) 
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Peat = (A, —K,C, Pein (A, —K,C, y + B,O,B, +K,R.Ky 
= AF, AY —K, (CPrniCe +R Kp + B,O,B; > (64) 


ko k/k-1 


is unchanged. The filtered output is given by 


Vern = Cy Sain + ™ - (65) 
2 
2 1 
= 
=) 0 
< 
ey 
-2 1 
-2 0 2 


21,ks £1,k/k 


Fig. 6. Measurements (dotted line) and filtered states (solid line) for Example 4. 


Example 4. Consider a filtering problem where A = diag(0.1, 0.1), B = C = diag(1, 
sin(2k) 
cos(3k) 


from (60) are shown in Fig. 6. The resulting Lissajous figure illustrates that states 
having nonzero means can be modelled using deterministic inputs. 


1), O=R = diag(0.001, 0.001), with wi. = . The filtered states calculated 


4.15 Correlated Process and Measurement Noises 


Consider the case where the process and measurement noises are correlated 


Willie or Q, S; 
ele | “(3 a o 


The generalisation of the optimal filter that takes the above into account was 
published by Kalman in 1963 [2]. The expressions for the state prediction 


Kevin = AX + Ky & — GX) (67) 


“There is no reason anyone would want a computer in their home.” Kenneth Harry Olson 
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and the state prediction error 
Keane = (Ay — KG) Xa + BW — KY (68) 
remain the same. It follows from (68) that 
: F Etwy$ 
EXX pnd = (A, - KC EK att [B, -K, ] Ey} : (69) 
k 


As before, the optimum predictor gain is that which minimises the prediction error 
covariance E{X,,, ,X141}- 


Lemma 7: In respect of the estimation problem defined by (1), (5), (6) with noise 


. _ T . . 
covariance (66), suppose there exist solutions P,, , = P.,., 2 0 to the Riccati 


difference equation 
Fan = APA +B,O,B, 
S40 Page, + B.S; MEP ay + R, i AG. + B.S, ‘a (70) 
over [0, NJ, then the state prediction (67) with the gain 
K,= (APC + BS (CPG +R," > (71) 
minimises Pry, = Ein akin} 


Proof: It follows from (69) that 


roe Sere eer = (A, —K,C EB Hina (A —K,C, y 


QO, S, || By 
ima 


= (A, SK CVE Cpe ak eis -K,C, a 


+B,O,B) +K,R,K, -B,S,K, -K,S,B; . i) 
Expanding (72) and denoting P,,_, = E4X,,,X,,,} gives 
Fan = AP a +B,O,B; 
HAPs; + BS, (CP niG +R, CAPE ae + BS, 
+(K,- (AP iCe + BS, NG Pia iCe +R) (CPG +2) 
_1\T 
x(K, (APC: + BS, (CPuiaG +®) ) . (73) 


“640K ought to be enough for anybody.” William Henry (Bill) Gates IIT 
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By inspection of (73), the predictor gain (71) minimises P.,,,,. 


Thus, the predictor gain is calculated differently when w, and v, are correlated. 
The calculation of the filtered state and filtered error covariance are unchanged, 
viz. 


Ken =U-EhG Fi tLe (74) 
Pu 7 (-L, C, Peet (I-L,C, y + LR Ly ’ (75) 

where 
L,= Fic, (CpPisC, +R)" : (76) 


However, FP.,,_, is now obtained from the Riccati difference equation (70). 


4.16 Including a Direct-Feedthrough Matrix 


Suppose now that the signal model possesses a direct-feedthrough matrix, Dy, 
namely 


Xpu1 = AX, + BW, , (77) 
VY, = Ox, + Dw, . (78) 


Let the observations be denoted by 
Z, =C,xX, +Y%> (79) 


where v, = D,w, + v,, under the assumptions (3) and (7). It follows that 


ef os 1}-| O, OP, oe (80) 
“i DQ, D,O,D; +R, ’ 


The approach of the previous section may be used to obtain the minimum-variance 
predictor for the above system. Using (80) within Lemma 7 yields the predictor 
gain 

K,= (AP iG, + B,O,D; )Q; > (81) 


where 


Q, = CPi ih +DQ,D; +R, (82) 


“Everything that can be invented has been invented.” Charles Holland Duell 
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and P,,, , is the solution of the Riccati difference equation 


Foie = APA, =K,O,K, + B,O,B, . (83) 


The filtered states can be calculated from (74) , (82), (83) and Lk= P.,,_,C,Q;". 


Fig. 7. The general filtering problem. The objective is to estimate the output of G. from noisy 


measurements of the output of G, F 


4.17 Solution of the General Filtering Problem 


The general filtering problem is shown in Fig. 7, in which it is desired to develop a 
filter A that operates on noisy measurements of G, and estimates the output of 


G,. Frequency domain solutions for time-invariant systems were developed in 


Chapters | and 2. Here, for the time-varying case, it is assumed that the system 
G, has the state-space realisation 


Xp) = A,X, + Bw, 5 (84) 
Von = Co Xe + Dy ,W, . (85) 


Suppose that the system G, has the realisation (84) and 
Vie = Cpe + Dip - (86) 
The objective is to produce estimates ,,,, of y,, from the measurements 


Z, =C,,%,+y, (87) 


“This ‘telephone’ has too many shortcomings to be seriously considered as a means of communication. 
The device is inherently of no value to us.” Western Union memo, 1876 
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where v, = D,,w, +v,, So that the variance of the estimation error, 


Vin = Vik — Vere > (88) 


is minimised. The predicted state follows immediately from the results of the 
previous sections, namely, 


Krak =A Xena + Ki (% — Cy pip) 


= (A, -K,C,, rina + Ky 2 (89) 
where 
K, = (APG. +B,O,D>, )Q;' (90) 
and 
Q, = oN eae Oe +D,,O,D> +R, > (91) 


in which P,,,_, evolves from 


Pon = AP 4A, =K, 0,5; +B,0,B;. (92) 
In view of the structure (89), an output estimate of the form 
Vivie = Gea +L,(&, - Co Xtina) 
a (Cy -LCy4 | Danie oe Oy ee (93) 


is sought, where L; is a filter gain to be designed. Subtracting (93) from (86) gives 
Vin = Vik = Dien 


is w 
= (Cy LCi) Xena +[D,, —L, | : : (94) 
Y% 
It is shown below that an optimum filter gain can be found by minimising the 


output error covariance E{j,,,57,}. 


Lemma 8: In respect of the estimation problem defined by (84) - (88), the output 
estimate Y,,,, with the filter gain 


L, = (Oph cs ¢ + D.O:D>, )Q;' (95) 


“The wireless music box has no imaginable commercial value. Who would pay for a message sent to 
nobody in particular?” David Sarnoff 
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eee ~ ~T 
minimises E{V,),¥.,} 


Proof: It follows from (94) that 


EQ ident = (Cy -L,C,, prey Coe -C) Ly) 


O O D! D" 
+[D,, -L, | k F k 2 ye 
D, Q, D,.Q,D,,, + R, -L, 

96 

= Core, = G1, 0,06, ’ 08) 


which can be expanded to give 


EG dint = Gein tne 
“(C4 P jo ge DOD OOr Cy hCG) Ob 
+(L, - (Cer iC + Dy O.D3, 1979; 
+L = (Cy Fein aCrn + Dy .Q: D344)’ - (97) 


By inspection of (97), the filter gain (95) minimises E{¥,,0,),} - 


The filter gain (95) has been generalised to include arbitrary Ci,x, Diz, and D2.. 
For state estimation, C2 = J and D2 = 0, in which case (95) reverts to the simpler 
form (26). The problem (84) — (88) can be written compactly in the following 
generalised regulator framework from control theory [13]. 


x; 
Xia A, Bug 0 3 
Vere = Cie Divx Dion > (98) 
k 
24 Coie Dy iy 0 7 
Vi kik 


where B,,=[0 Be], Cie =Cres Coe =Coe Dire = [0 Dipl, D, 2, =1 and 


Dyk =|t Digit With the above definitions, the minimum-variance solution 


can be written as 


arate = AS +K,(%,- Coin Brisa) > (99) 
Dien = Cries +L,(Z,- Copa) > (100) 


where 


“Video won't be able to hold on to any market it captures after the first six months. People will soon 
get tired of staring at a plywood box every night.” Daryl Francis Zanuck 
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-1 
R 0 R 0 
K, =| APC + Buy i Q, Jota |[Gutnch. +Dyi4 5 5,2) ’ (101) 


-1 

R, 0 R, 0 

Le =| Cie Pon Cons Daal ‘ Jet |(Cotnsch Daal : Jose] > (102) 
0 @, 0 9, 


in which P,,,_, is the solution of the Riccati difference equation 


T T R, 0 
Pos = MPa — Ki (Cra Cra, af Dyix 


D) KI +B Bed Br 103 
0 O, be) a+ 11k 11k t ( ) 


0 9 

The application of the solution (99) — (100) to output estimation, input estimation 
(or equalisation), state estimation and mixed filtering problems is demonstrated in 
the example below. 


Fig. 8. The mixed filtering and equalisation problem considered in Example 5. The objective is to 
estimate the output of the plant G, which has been corrupted by the channel G, and the 


measurement noise vy. 


Example 5. 

(i) For output estimation problems, where C;4 = C24 and D;.4= D2,, the 
predictor gain (101) and filter gain (102) are identical to the 
previously derived (90) and (95), respectively. 

(i1) For state estimation problems, set C;,, = J and D; = 0. 

(ili) For equalisation problems, set C;,4= 0 and D;, = J. 

(iv) Consider a mixed filtering and equalisation problem depicted in Fig. 


8, where the output of the plant G has been corrupted by the 
channel G,. Assume that @G has the realisation 


“Louis Pasteur’s theory of germs is ridiculous fiction.” Pierre Pachet, Professor of Physiology at 
Toulouse, 1872 
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XM kel A, By Xk | Notj ou. 
= . Noting the realisation of the cascaded 
Vik Ce Dre Le | 


system GG, (see Problem 7), the minimum-variance solution can 


| Ay, By Gy a= 0 By Diy 
0 A, , ow 0 By ; 


be found by setting A; = 


Bizg = A Cik= [Cy O], Cork = [Cre DopCie |, Disa = 


[0 Di, | and Do14= [7 Dig Dig : 


4.18 Hybrid Continuous-Discrete Filtering 


Often a system’s dynamics evolve continuously but measurements can only be 
observed in discrete time increments. This problem is modelled in [20] as 


x(t) = A(t)x(t) + BIA) w(t) , (104) 
Z, =C,x, +v,, (105) 


where E{w(t)} = 0, E{w()w"(2)} = O(Dd(t — 2), Ef{vit = 0, Evy, } = RyOjx and xx 
= x(kT;), in which T, is the sampling interval. Following the approach of [20], state 
estimates can be obtained from a hybrid of continuous-time and discrete-time 
filtering equations. The predicted states and error covariances are obtained from 


X(t) = A(t)R(1) , (106) 

P(t) = A(t)P(t) + P(t) A’ (t) + B(t)O(t)B’ (t) . (107) 

Define X,,,., = X(t) and Piz1 = P(t) at t = kT. The corrected states and error 
covariances are given by 

Kee = Xena thy (& —OXia)> (108) 

Poy =A - LG Fria» (109) 


where Li = P.,, ,C/(C,P,,.,C, +R,). The above filter is a linear system having 
jumps at the discrete observation times. The states evolve according to the 
continuous-time dynamics (106) in-between the sampling instants. This filter is 
applied in [20] for recovery of cardiac dynamics from medical image sequences. 


“Heavier-than-air flying machines are impossible. ” Baron William Thomson Kelvin 
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ASSUMPTIONS MAIN RESULTS 
Etwi = Efvi} = 0. Xv = Agxn + Bewe 
Ew, w,} a Orn and Xie = Apxn + D2 we 
ac} Zk = Y2k + Ve 
a T = , 
5 E{v,v,} = Re are jie Cy Dike 
aa known. Ag, Bs C14, 
Ss 2 
mo) 2. C24, Di,4, D2 are 
‘oe known. 
Resin = (Ay — Ke Co) Xena + Riz 

zZ ie = Digik = (Cy — Ly Cr i) Xen + 4% 
Boe 
= Ss 
mes 
q Ox > 0, Ri > 0. K, = (AP Co + B.O,D,) 
5b 2 (Cy Pri sCoy ag DOD; Be R, Ss 
ta) 

| 
= e L,= (Ces +D,,O,D54) 
5 oO 

[oma = 
qd = MG PSC, oe Dy ,O,D>, +R,) 
ees : 
a 8 : Pest SAF A, + B,O,B, 
je} 
3 Pe s -K, (Cop Prin iCon +D,,Q,D>, +R, Ki 
Zs is} 
PES 
as od 


Table 1.1. Main results for the general filtering problem. 


4.19 Chapter Summary 


A linear, time-varying system @, is assumed to have the realisation xj+1 = Agxx + 
Biwx and yoe = Cryxx + Doxwe. In the general filtering problem, it is desired to 
estimate the output of a second reference system G, which is modelled as v4 = 


C1 4xe + Di ewe. The Kalman filter which estimates y, from the measurements z; = 
yak + vg at time k is listed in Table 1. 


If the state-space parameters are known exactly then this filter minimises the 
predicted and corrected error covariances E{(x, — %,,,)(%, — %,,)'} and 
E{(x, — %,)(%;, — %,,)'}, respectively. When there are gaps in the data record, 


or the data is irregularly spaced, state predictions can be calculated an arbitrary 
number of steps ahead, at the cost of increased mean-square-error. 


The filtering solution is specialised to output estimation with Ci, = C2, and Di, = 
D4. 


“He was a multimillionaire. Wanna know how he made all of his money? He designed the little 
diagrams that tell which way to put batteries on.” Stephen Wright 
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In the case of input estimation (or equalisation), Cix4 = 0 and Dix = J, which 
results in w,,, = —L,C,,X,,,, + Liz, where the filter gain is instead calculated as 


Le= O,D3 (Gi GC ie D0); +R, Ss . 


For problems where C\, = J (state estimation) and Di, = D2,x = 0, the filtered state 
calculation simplifies to X,,, =U—- L,C,,)X,j4. + Lize where X40) = AX 
and Le = Poy Coy (CoP, sCz, + R,)'. This predictor-corrector form is used to 


obtain robust, hybrid and extended Kalman filters. When the predicted states are 
not explicitly required, the state corrections can be calculated from the one-line 


recursion X,,, =(— £,C,,)ApXparya + Lize 


If the simplifications By = Dz, = 0 are assumed and the pair (Az, C2) is retained, 
the Kalman filter degenerates to the RLS algorithm. However, the cost of this 
model simplification is an increase in mean-square-error. 


4.20 Problems 


Problem 1. Suppose _ that % +15] and eff? [a a ~ 
: pp B. = B B, k : 


be ) . Show that an estimate of a, given £,, which minimises E {(@, 
By %, BP 


— Ef, |B3a, - Efa,|B3)"}, is given by Ela, |B} = 


B+ La 5, Up. (Bi -B) . 


Problem 2. Derive the predicted error covariance 

_ Tr; T T -1 T T 
Posie = APorsde - AFG (GPinaGe +R) CFs + 3,.O,8, from 
the state prediction X,,,,, = A,X, + K,(Z,-C,%,4_,), the model xg+1 = Agxg + 
Biwi, Ve = CrXx and the measurements z% = ye + ve. 


Problem 3. Assuming the state correction X,,, = X,,., + L(x CX pi) 
show that the corrected error covariance is given by P, = Fiy4 
Ly (CPanaGe + RL 

Problem 4 [11], [14], [17], [18], [19]. Consider the standard filter equations 


Xie = AXpea> 


p A ; % 
Xue = Xena + L(% C.Xna)> 


“But what is it good for?” Engineer at the Advanced Computing Systems Division of IBM, 
commenting on the micro chip, 1968 
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Pra = APonad + 3BQ,Be 
Pip Bing WTO Gt Rs 
derive the continuous-time filter equations, namely 
X(t, )= At, x(t, )+K(,) (z(t, )- CY, x(t, )) > 


P(t,) = A(t, P(t.) + P(A’ (4) — PO JC" (GR (4, CE, P(t) + BY JOH, )B’ (t,) 
where K(t,) = P(t,)C(t,)R'(t,). (Hint: Introduce the quantities 4, = U/ + A(t))At, 
Bits) = Br, Cte) = Cry Friy , O(te) = Oil At, RO) = ReAt, X(G,) = Xe s Plt) = 


- . Xp —X . . P,P 
Poe , X(t,) as lim k/k k-I/k-1 : P(t,) —_ lim k+i/k k/k-1 and At= ty tit.) 
>0 At At>0 At 


Problem 5. Derive the two-step-ahead predicted error covariance P,,,,, 
T T 
Au Pie Apa a Be QB : 


Problem 6. Verify that the Riccati difference equation P..,, = 4,P.,,4, — 
K,(C,PjiCe + R)Kp + B,O,.Br, where K, = (APC, + 


BS MCP RJ", is equivalent to Pj, = (A. — KC) Pula 7 
KC) + K,R,Ky at: BOB, Pal B,S,Kj ~ K,S,B, . 


Problem 7 [16]. Suppose that the systems y,;= G, wx and yor = G, we have the 
state-space realisations 


ee 7 le By |" oer a = i By le 
Vik Cy Diy WwW, Yok Che D,, Wy 
Show that the system y3,= G,G, wis given by 


Ax 0 By Xk 


x, . 
hal By Gy Ay BopDre || Xo 

34k 
Dy Cy Cy Dy Dy, WwW, 


4.21 Glossary 


In addition to the notation listed in Section 2.6, the following nomenclature has 
been used herein. 


“What sir, would you make a ship sail against the wind and currents by lighting a bonfire under her 
deck? I pray you excuse me. I have no time to listen to such nonsense.” Napoléon Bonaparte 
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G 


Ox Re 


A system that is assumed to have the realisation x41 = Agxx + 
Buwy and ye = Cixe + Dawe where Ax, Bi, Ce and Dy are time- 
varying matrices of appropriate dimension. 

Time-varying covariance matrices of stochastic signals wz and 
vs, respectively. 

Adjoint of ¢ . The adjoint of a system having the state-space 


parameters {A,, Bx, Cr, Dx} is a system parameterised by { A , 
va T T 

-C, ,-B,, D, }. 

Filtered estimate of the state x, given measurements at time k. 


Filtered state estimation error which is defined by X,,, = x« — 


Keit . 

Corrected error covariance matrix at time k given 
measurements at time k. 

Time-varying filter gain matrix. 

Predicted estimate of the state x,+1 given measurements at time 


k. 
Predicted state estimation error which is defined by x,,,,, = 


Xeei — Xe aug + 

Predicted error covariance matrix at time k + 1 given 
measurements at time k. 

Time-varying predictor gain matrix. 

Recursive Least Squares. 
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5. Discrete-Time, Steady-State, Minimum- 
Variance Filtering 


5.1 Introduction 


This chapter presents the minimum-variance filtering results simplified for the 
case when the model parameters are time-invariant and the noise processes are 
stationary. The filtering objective remains the same, namely, the task is to estimate 
a signal in such as way to minimise the filter error covariance.A somewhat naive 
approach is to apply the standard filter recursions using the time-invariant problem 
parameters. Although this approach is valid, it involves recalculating the Riccati 
difference equation solution and filter gain at each time-step, which is 
computationally expensive. A lower implementation cost can be realised by 
recognising that the Riccati difference equation solution asymptotically 
approaches the solution of an algebraic Riccati equation. In this case, the algebraic 
Riccati equation solution and hence the filter gain can be calculated before 
running the filter. 


The steady-state discrete-time Kalman filtering literature is vast and some of the 
more accessible accounts [1] — [14] are canvassed here. The filtering problem and 
the application of the standard time-varying filter recursions are described in 
Section 5.2. An important criterion for checking whether the states can be 
uniquely reconstructed from the measurements is observability. For example, 
sometimes states may be internal or sensor measurements might not be available, 
which can result in the system having hidden modes. Section 5.3 describes two 
common tests for observability, namely, checking that an observability matrix or 
an observability gramian are of full rank. The subject of Riccati equation 
monotonicity and convergence has been studied extensively by Chan [4], De 
Souza [5], [6], Bitmead [7], [8], Wimmer [9] and Wonham [10], which is 
discussed in Section 5.4. Chan, et al [4] also showed that if the underlying system 
is stable and observable then the minimum-variance filter is stable. Section 6 
describes a discrete-time version of the Kalman-Yakubovich-Popov Lemma, 
which states for time-invariant systems that solving a Riccati equation is 


“Science is nothing but trained and organized common sense differing from the latter only as a veteran 
may differ from a raw recruit: and its methods differ from those of common sense only as far as the 
guardsman's cut and thrust differ from the manner in which a savage wields his club.” Thomas Henry 
Huxley 
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equivalent to spectral factorisation. In this case, the Wiener and Kalman filters are 
the same. 


Since the optimal filter is model-based, any unknown model parameters need to be 
estimated (as explained in Chapter 7) prior to implementation. The estimated 
parameters can be inexact which leads to degraded filter performance. An iterative 
frequency weighting procedure is described in Section 5.5 for mitigating the 
performance degradation. 


5.2. Time-Invariant Filtering Problem 
5.2.1 The Time-Invariant Signal Model 


A discrete-time time-invariant system (or plant) G:R’’* — R’*” is assumed to 
have the state-space representation 


X,4, = Ax, + Bw, , (1) 
y, = Cx, + Dw, , (2) 


where 4 € R””, Be R"™ ,Ce R’",De R?”, weis a stationary process with 
E{w,} = 0 and Etw,w; } = Q. For convenience, the simplification D = 0 is 


initially assumed within the developments. The nonzero feedthrough matrix, D, 
can be accommodated as described in Chapter 4. Observations z; of the system 
output y; are again modelled as 


Z,=Y+y,, (3) 
where vz is a stationary measurement noise sequence over an interval k € [1, N], 
with E{v,} =0, Etw,v, } = 0 and Evy} = R. The objective is to design a filter 
#H that operates on the above measurements and produces an estimate, },,, 


C,%;/,, Of ye So that the covariance, E{},,,31,,}, of the filter error, },,, =e — 


Dujz> 18 minimised. 
5.2.2 Application of the Time-Varying Filter Recursions 
A naive but entirely valid approach to state estimation is to apply the standard 


minimum-variance filter recursions of Chapter 4 for the problem (1) — (3). The 
predicted and corrected state estimates are given by 


“What happens depends on our way of observing it or the fact that we observe it.” Werner Heisenberg 
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Kise =(A- KO, iyi + KZ; (4) 
Kin =U-LG iyi ths (5) 


where Lx = P.,, ,C’(CP,,, ,C + R)' is the filter gain, Ke= AP, ,C’(CP,,,C + 
R)" is the predictor gain, in which P,,., = E{%,,, ,X;,,} is obtained from the 
Riccati difference equation 


P.,, = APA" — AP.C" (CP.C’ +R) 'CP,A’ + BOB’. (6) 


As before, the above Riccati equation is iterated forward at each time k from an 
initial condition Py. A necessary condition for determining whether the states 
within (1) can be uniquely estimated is observability which is discussed below. 


5.3. Observability 


5.3.1 The Discrete-time Observability Matrix 


Observability is a fundamental concept in system theory. If a system is 
unobservable then it will not be possible to recover the states uniquely from the 
measurements. The pair (A, C) within the discrete-time system (1) — (2) is defined 
to be completely observable if the initial states, xo, can be uniquely determined 
from the known inputs w,; and outputs yz over an interval k € [0, N]. A test for 
observability is to check whether an observability matrix is of full rank. The 
discrete-time observability matrix, which is defined in the lemma below, is the 
same the continuous-time version. The proof is analogous to the presentation in 
Chapter 3. 


Lemma I [1], [2]: The discrete-time system (1) — (2) is completely observable if 
the observability matrix 


O, =| CA? |,N>n-1, (7) 
CA” 
is of rank n. 


Proof: Since the input we is assumed to be known, it suffices to consider the 
unforced system 


“You affect the world by what you browse.” Tim Berners-Lee 
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Xpu1 = AX, » (8) 
Vy, = OX - (9) 
It follows from (8) — (9) that 
Vy = CX) 
y, = Cx, = CAx, 


=H Ci = C4 x, 


Vy = Cry = CA™ x, 3 (10) 
which can be written as 
| ¥ | I 
J\ A 


Vy | A” (11) 


From the Cayley-Hamilton Theorem, A‘, for k >n, can be expressed as a linear 
combination of A®, A’, .... A! . Thus, with N >n — 1, equation (11) uniquely 
determines xo if On has full rank n. 


Thus, if On is of full rank then its inverse exists and so xo can be uniquely 
recovered as x9 = O,'y. Observability is a property of the deterministic model 


equations (8) — (9). Conversely, if the observability matrix is not rank n then the 
system (1) — (2) is termed unobservable and the unobservable states are called 
unobservable modes. 


5.3.2 Discrete-time Observability Gramians 


Alternative tests for observability arise by checking the rank of one of the 
observability gramians that are described below. 


Lemma 2: The pair (A, C) is completely observable if the observability gramian 


N 
Wy =O,O0y = (A")'C'CA' ,N=n-1 (12) 
k=0 


“Tt is a good morning exercise for a research scientist to discard a pet hypothesis every day before 
breakfast.” Konrad Zacharias Lorenz 
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is of full rank. 
Proof: It follows from (8) — (9) that 


I 
A 
yyaxt[r AP (ATP (AT CPC] &? |x. (13) 
aN 
From the Cayley-Hamilton Theorem, A‘, for k >n, can be expressed as a linear 
combination of A®, A!, ..., A"! . Thus, with N=n-—1, 
n-l 
py y= x, OO =x, Rae » (A’)‘ crea |x (14) 
k=0 


is unique provided that Wy is of full rank. 


It is shown below that an equivalent observability gramian can be found from the 
solution of a Lyapunov equation. 


Lemma 3: Suppose that the system (8) — (9) is stable, that is, \A(A)| < 1, i = 1 to 
n, then the pair (A, C) is completely observable if the nonnegative symmetric 
solution of the Lyapunov equation 


W = A'WA+C'C. (15) 
is of full rank. 


Proof: Pre-multiplying C'C = W — A™WA by (A")‘, post-multiplying by A* and 
summing from k = 0 to N results in 


N N N 


(4) CCA = 31 (4")' WAT SAY A 
k=0 k=0 k=0 
= Wy, —(A")F Ww, A : (16) 


Since lim( 4") W,, A‘ = 0, by inspection of (16), W = limW, is a solution of 
the Lyapunov equation (15). Observability follows from Lemma 2. 


It is noted below that observability is equivalent to asymptotic stability. 


“We follow abstract assumptions to see where they lead, and then decide whether the detailed 
differences from the real world matter.” Clinton Richard Dawkins 
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Lemma 4 [3]: Under the conditions of Lemma 3, xo € €, impliesy € €,. 


N 
Proof: It follows from (16) that Lay C'CA‘ <Wyand therefore 


k=0 


N N 
2 k k 
bli = Samm =a (Lartcres" ry sah 
k=0 k=0 
from which the claim follows. 
Another criterion that is encountered in the context of filtering and smoothing is 


reachability. A linear time-invariant system is said to be reachable when all its 
modes are excited. Reachability is discussed in Chapters 8 and 11. 


Example 1. (i) Consider a stable second-order system with A = 


= [1 1] . The observability matrix from (7) and the observability gramian from 


C 1 1 ‘s 1.01 1.06 . 
(12) are O, = 7 and W, = O,O, = , respectively. 


CA 0.1 0.6 1.06 1.36 
It can easily be verified that the solution of the Lyapunov equation (15) is W = 
1.01 1.06 
F 06 4 FA = Ws, to three significant figures. Since rank(O1) = rank(Wi) = 


rank(W4) = 2, the pair (A, C) is observable. 


(ii) Now suppose that measurements of the first state are not available, that is, 
; 0 1 0 0 ; 
C= [0 1] . Since O; = and W; = are of rank 1, the pair (A, 
0 04 0 1.16 


C) is unobservable. This system is detectable because the unobservable mode is 
stable. 


5.4 Riccati Equation Properties 
5.4.1 Monotonicity 


It will be shown below that the solution P,,,,, of the Riccati difference equation 


(6) monotonically approaches a steady-state asymptote, in which case the gain is 
also time-invariant and can be precalculated. Establishing monotonicity requires 
the following result. It is well known that the difference between the solutions of 
two Riccati equations also obeys a Riccati equation, see Theorem 4.3 of [4], (2.12) 


“The way to succeed is to double your failure rate.” Thomas Watson, Founder of IBM 
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of [5], Lemma 3.1 of [6], (4.2) of [7], Lemma 10.1 of [8], (2.11) of [9] and (2.4) of 
[10]. 
Theorem 1: Riccati Equation Comparison Theorem [4] — [10]: Suppose for a t = 
0 and for all k = 0 the two Riccati difference equations 

fey = APA SAP 530" (CE aac +R)'CP A’ ae BOB" > (17) 


t+k-1 
Bes = AP Ae ~ AP,,,C" (CEG? +R) CRA + BOB" ’ (18) 


have solutions P,, = 0 and P,,, = 0, respectively. Then P,, =P,,-P.,,., 
Satisfies 

sara = Arh Ag Ail CF (CEC? +R, JGR yA ? (19) 
where Api =A = AR a OCP ye + Re el Cray and ae = CPC + 
R. 


and R 


t+k+1 


The above result can be verified by substituting A, 


t+k+1 


into (19). The 
above theorem is used below to establish Riccati difference equation 
monotonicity. 

Theorem 2 [6], [9], [10], [11]: Under the conditions of Theorem 1, suppose that 
the solution of the Riccati difference equation (19) has a solution P.,, >0 forat> 


t+k 
Oandk=0.Then P,, = P.,,,, forallk=0. 


ttk — “t 


Proof: The assumption P. 


4 2 O is the initial condition for an induction argument. 


C'(CP,,C’ + R,)' <I that P,, < 


For the induction step, it follows from CP we 
which together with Theorem 1 implies P.,, > 0. 


t+k 
PC’ (CP,,C’ + R,)'CP 


t+k tt+k? 


The above theorem serves to establish conditions under which a Riccati difference 
equation solution monotonically approaches its steady state solution. This requires 
a Riccati equation convergence result which is presented below. 


5.4.2 Convergence 


When the model parameters and noise statistics are constant then the predictor 
gain is also time-invariant and can be pre-calculated as 


K = APC’(CPC' +R)", (20) 


“We know very little, and yet it is astonishing that we know so much, and still more astonishing that so 
little knowledge can give us so much power”. Bertrand Arthur William Russell 
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where P is the symmetric positive definite solution of the algebraic Riccati 
equation 


P = APA’ — APC’ (CPC’ + R)'CPA’ + BOB" (21) 
=(A-—KC)P(A-—KC)' +BQB" + KRK’. (22) 


A real symmetric nonnegative definite solution of the Algebraic Riccati equation 
(21) is said to be a strong solution if the eigenvalues of (A — KC) lie inside or on 
the unit circle [4], [5]. If there are no eigenvalues on the unit circle then the strong 
solution is termed the stabilising solution. The following lemma by Chan, 
Goodwin and Sin [4] sets out conditions for the existence of solutions for the 
algebraic Riccati equation (21). 


Lemma 5 [4]: Provided that the pair (A, C) is detectable, then 
i) the strong solution of the algebraic Riccati equation (21) exists and is 
unique; 
ii) if A has no modes on the unit circle then the strong solution coincides 
with the stabilising solution. 


A detailed proof is presented in [4]. If the linear time-invariant system (1) — (2) is 
stable and completely observable and the solution P; of the Riccati difference 
equation (6) is suitably initialised, then in the limit as k approaches infinity, Px, 
will asymptotically converge to the solution of the algebraic Riccati equation. This 
convergence property is formally restated below. 


Lemma 6 [4]: Subject to: 
i) the pair (A, C) is observable; 
ji) \A(A)| <1Li=1 ton; 
iii) (Po — P) =0; 
then the solution of the Riccati difference equation (6) satisfies 


lim P, =P. (23) 


A proof appears in [4]. This important property is used in [6], which is in turn 
employed within [7] and [8]. Similar results are reported in [5], [13] and [14]. 
Convergence can occur exponentially fast which is demonstrated by the following 
numerical example. 


Example 2. Consider an output estimation problem where A = 0.9 and B= C=Q 
= R=1. The solution to the algebraic Riccati equation (21) is P = 1.4839. Some 
calculated solutions of the Riccati difference equation (6) initialised with Py = /0P 
are shown in Table 1. The data in the table demonstrate that the Riccati difference 
equation solution converges to the algebraic Riccati equation solution, which 
illustrates the Lemma. 


“Great is the power of steady misrepresentation - but the history of science shows how, fortunately, 
this power does not endure long”. Charles Robert Darwin 
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F, Faw Fi 
1 1.7588 13.0801 

2 1.5164 0.2425 

5 1.4840 4.7955*107 
10 1.4839 1.8698* 10° 


Table. 1. Solutions of (21) for Example 2. 


5.5 The Steady-State Minimum-Variance Filter 


5.5.1 State Estimation 


The formulation of the steady-state Kalman filter (which is also known as the 
limiting Kalman filter) follows by allowing & to approach infinity and using the 
result of Lemma 6. That is, the filter employs fixed gains that are calculated using 
the solution of the algebraic Riccati equation (21) instead of the Riccati difference 
equation (6). The filtered state is calculated as 


Kare = Sgn tL Z — Oy) =U -LO Ky + Ly,» (24) 


where L = PC’(CPC’ + R)"' is the time-invariant filter gain, in which P is the 
solution of the algebraic Riccati equation (21). The predicted state is given by 


Spain = Ah = (AKO) Ra Res 
(25) 


where the time-invariant predictor gain, K, is calculated from (20). 


5.5.2 Asymptotic Stability 


The asymptotic stability of the filter (24) — (25) is asserted in two ways. First, 
recall from Lemma 4 (ii) that if |A{A)| < 1, i = 1 to n, and the pair (4, C) is 
completely observable, then |A(A — KC)| < 1, i = 1 to n. That is, since the 
eigenvalues of the filter’s state matrix are within the unit circle, the filter is 
asymptotically stable. Second, according to Lyapunov stability theory [1], the 


“The scientists of today think deeply instead of clearly. One must be sane to think clearly, but one can 
think deeply and be quite insane.” Nikola Tesla 
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unforced system (8) is asymptotically stable if there exists a scalar continuous 
function V(x), satisfying the following. 


(i) V(x)>0 forx#0. 

(ii) V(xr+1) — Vid) < 0 for x. #0. 

(iii) V(O) = 0. 

(iv) V(x) > © as I x|L, — ©, 
Consider the function V(x,) = x, Px, where P is a real positive definite 
symmetric matrix. Observe that V(x,,,) — V(x,) = (Px... — x, Px, = 
x, (A'PA — P)x, <0. Therefore, the above stability requirements are satisfied if 


for a real symmetric positive definite Q, there exists a real symmetric positive 
definite P solution to the Lyapunov equation 


APA’ -P=- Q. (26) 
By inspection, the design algebraic Riccati equation (22) is of the form (26) and so 


the filter is said to be stable in the sense of Lyapunov. 


5.5.3 Output Estimation 


For output estimation problems, the filter gain, LZ, is calculated differently. The 
output estimate is given by 


Derr = Kreis 
= Kei +L (Zz, - Kena) 
=(C-LC)x,,,_,+ Lz, (27) 


where the filter gain is now obtained by L = CPC’(CPC’ + R)". The output 
estimation filter (24) — (25) can be written compactly as 


a = ee alia (28) 
Pere (C-LC) L Zi ' 
from which its transfer function is 


H,,(z) =(C-LC)(zl-A+KC)'K+L. (29) 


“Any intelligent fool can make things bigger and more complex... It takes a touch of genius - and a lot 
of courage to move in the opposite direction.” Albert Einstein 
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5.6 Equivalence of the Wiener and Kalman Filters 
As in continuous-time, solving a discrete-time algebraic Riccati equation is 
equivalent to spectral factorisation and the corresponding Kalman-Yakubovich- 


Popov Lemma (or Positive Real Lemma) is set out below. A proof of this Lemma 
makes use of the following identity 


P— APA’ =(zIl — A)P(z ‘I-A’ )+ AP(z'I- A’) +(zl - A)PA’ . (30) 


Lemma 7. Consider the spectral density matrix 


7 Q Oll(@-A’)'C" 
AA" (z)=|C(zI-A)' I : 31 
@=[eer-a ie AT G1) 
Then the following statements are equivalent. 
(i) AA" (e!’) > 0, forall w €(-n, 2). 
BOB’ —P+APA’ APC’ 
ay |P2 | 20 
CPA CPC’ +R 
(iti) There exists a nonnegative solution P of the algebraic Riccati equation 


(21). 
Proof: Following the approach of [12], to establish equivalence between (i) and 
(iii), use (21) within (30) to obtain 
BOB' — APC’ (CPC' + R)CPA’ =(zI -— A)P(z'I- A’) 
+AP(z'I-A')+(zl — A)PA’ . (32) 


Premultiplying and postmultiplying (32) by C(zl-— A)" and (z'I-A')'C’, 
respectively, results in C(zI — A)'(BOB"’ — APC’QCPA’)(z"'I- A')C’ = CPC 
+ C(zIl—A)' APC’ + CPA'(z'I-A')'C’, where Q = CPC’ +R. Hence, 


AA" (z) = GOG" (z) +R = C(zl— A) BOB (z"I-AT)'CT+R 
= C(zI — A)! APC’QCPA’ (z= A?) 'C7 
+C(zI — A) APC? + CPA (z"I- ATY'CT + 
=(C(zI- A)'K +1)O(K? (z'I-ATY'C" +1) 
0 (33) 


IV 


“The telephone did not come into existence from the persistent improvement of the postcard.” Amit 
Kalantri 
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The Schur complement formula can be used to verify the equivalence of (ii) and 
(iii). 

In Chapter 2, it is shown that the transfer function matrix of the optimal Wiener 
solution for output estimation is given by 


Ho, (2) =1-R{A"},A4(2), (34) 
where { }+ denotes the causal part. This filter produces estimates y,,, from 
measurements z;. By inspection of (33) it follows that the spectral factor is 

A(z) = C(2l — 4)' KQ'? +0". (35) 


The Wiener output estimator (34) involves A7'(z) which can be found using (35) 


and a special case of the matrix inversion lemma, namely, [J + C(zI — A)'K]! =I 
— C(zl— A+ KC)'K. Thus, the spectral factor inverse is 


A(z) =Q7? -Q"?’C(zI- A+ KC)'K. (36) 


It can be seen from (36) that {A}, = Qv'?. Recognising that J-RQ™' = 
(CPC + R\(CPCT + Ry! — R(CPCT + Ry! = CPC'(CPC' + Ry! =L, the Wiener 
filter (34) can be written equivalently 


Ho, (z) =1-RQ™*A'(z) 


= 1 —RQ™+RQ"C(zI- A+ KC)'K 
= EAE LOGI =A ERO | (37) 


which is identical to the transfer function matrix of the Kalman filter for output 
estimation (29). In Chapter 2, it is shown that the transfer function matrix of the 
input estimator (or equaliser) for proper, stable, minimum-phase plants is 


H,,(z)=G"(z)I-R{A"}, A(z). (38) 


Substituting (35) into (38) gives 


H,p(2)=G"(2)Ho_(2). (39) 


The above Wiener equaliser transfer function matrices require common poles and 
zeros to be cancelled. Although the solution (39) is not minimum-order (since 
some pole-zero cancellations can be made), its structure is instructive. In 
particular, an estimate of wz can be obtained by operating the plant inverse on 


“Tt is not the possession of truth, but the success which attends the seeking after it, that enriches the 
seeker and brings happiness to him.” Max Karl Ernst Ludwig Planck 
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J,» provided the inverse exists. It follows immediately from L = CPC’(CPC’ + 
R)" that 


limL=/. (40) 


R>0 


By inspection of (34) and (40), it follows that 


lim sup |H,,(e’?)/=T. (41) 


R>0 of 


-1,7} 

Thus, under conditions of diminishing measurement noise, the output estimator 
will be devoid of dynamics and its maximum magnitude will approach the identity 
matrix. Therefore, for proper, stable, minimum-phase plants, the equaliser 
asymptotically approaches the plant inverse as the measurement noise becomes 
negligible, that is, 


lim H jp(z) = G'(z). (42) 
Time-invariant output and input estimation are demonstrated below. 


Example 3. Consider a time-invariant input estimation problem in which the plant 
is given by 


G(z) =(z + 0.9)9°(z + 0.1)? 
= (22 + 1.82 + 0.81)(22 + 0.22 + 0.01) 7 
(1.62 + 0.8)\(22 + 0.22 + 0.01) +1, 


together with O = 1 and R = 0.0001. The controllable canonical form (see Chapter 


: -0.2 -0.1 1 
1) yields the parameters A = ' ; | B= 3 ,C= [1.6 1.8] andD=1. 


From Chapter 4, the corresponding algebraic Riccati equation is P = APA? — 
KOK'™ + BOB’, where K = (APC! + BOD")Q" and Q= CPC!’ +R + DOD’. The 
minimum-variance output estimator is calculated as 


a ee alia 
Derk (C-LC) L Zi ‘ 
0.0026  -—0.0026 


0.0026 0.0026 


algebraic Riccati equation was found using the Hamiltonian solver within 
Matlab®. 


where L = (CPC? + DQD‘)Q". The solution =| for the 


“There is no result in nature without a cause; understand the cause and you will have no need of the 
experiment.” Leonardo di ser Piero da Vinci 
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w=sqrt (Q) *randn(N,1); 
x= [070]; 
for k = 1:N 
y(k) = C*x + D*w(k); plant output 
x = A*x + B*tw(k); 


process noise 
initial state 


2 
© 


v=sqrt(R) *randn(1,N); measurement noise 
Z=ytvw measurement 
omega=C*P*(C’) + D*Q*(D’) + R; 
K = (A*P* (C’)+B*Q* (D’)) *inv (omega) ; predictor gain 
Q* (D’) *inv (omega) ; % equaliser gain 
; s initial state 


stimate(k) = - L*C*x + L*z(k); % equaliser output 


(A - K*C)*x + K*z(k); predicted state 


Fig. 2. Sample trajectories for Example 5: (i) measurement sequence (dotted line); (ii) 
actual and estimated process noise sequences (superimposed solid lines). 


The resulting transfer function of the output estimator is 


Hox(z) = (z + 0.9)(z + 0.9), 


which illustrates the low-measurement noise asymptote (41). The minimum- 
variance input estimator is calculated as 


Kassie = (A-KC) K Kein 
Wry -LIC Lil x 


5) 


where L = QD'Q". The input estimator transfer function is 


“Your theory is crazy, but it’s not crazy enough to be true.” Niels Henrik David Bohr 
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H(z) = (z + 0.1)? (z + 0.9) 2, 


which corresponds to the inverse of the plant and illustrates the asymptote (42). A 
simulation was generated based on the fragment of Matlab® script shown in Fig. 1 
and some sample trajectories are provided in Fig. 2. It can be seen from the figure 
that the actual and estimated process noise sequences are superimposed, which 
demonstrates that an equaliser can be successful when the plant is invertible and 
the measurement noise is sufficiently low. In general, when measurement noise is 
not insignificant, the asymptotes (41) — (42) will not apply, as the minimum- 
variance equaliser solution will involve a trade-off between inverting the plant and 
filtering the noise. 


5.7 Frequency Weighted Filtering 


5.7.1 Overview 


Frequency weighting can be used in filter and controller designs to manage 
performance within bands of interest. For example, Grimble et al. [15] employ a 
stable, minimum-phase frequency shaping function to weight the average power 
of the error spectral density in the design of least squares and H.. polynomial 
filters. Zang et al. defined frequency weighting functions as ratios of prediction 
error and exogenous signal spectra in an iterative controller design [16]. Pots et al. 
chose a frequency weighting function to attenuate noise in a specified frequency 
band within a robust controller [17]. Perceptual weighting functions based on 
ratios of linear prediction coefficients can be used within speech codecs [18]. 


It is shown below that frequency weighting can be applied to improve on the 
performance of filters which rely on inexact modelling assumptions. Parameter 
uncertainty can be also accommodated explicitly using robust designs as described 
in Chapter 9. 


5.7.2. Problem Definition 


Consider a system GY having the realisation (1) — (2) with measurements (3), 
where y, € R,C € R™ and D=0. A filter solution A is desired that produces 


estimates y, of y, from the observations so that the energy of the estimation 
error 


Ve TELS) (43) 


is minimised, where WV: R — R is a causal frequency-weighting system that 
specifies a band of interest. 


“Clear thinking requires courage rather than intelligence.” Thomas Stephen Szasz 
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The estimation problem is depicted in Fig 3. It can be seen from the figure that the 
error is generated by y, -2|"], where R=W\|(G -HG) 2), see [21]. 
v 


Let (.)% and (.)” denote the Hermitian conjugate and its inverse. Define 


2 
(oy 


y 


- 0 : 
®,, = a |r to be the power spectral density of y, . The performance 


objective is to find a solution A that minimises le « 


, where lL, denotes the 2- 
2, 


norm. 


Fig. 3. The frequency-weighted estimation problem. The objective is to design a filter A that 


produces estimates y that minimise the energy of the frequency-weighted estimation error y . 


Usually, in the absence of frequency weighting, i.e, W = 1, the optimal filter 


minimises |e we 


Ms provided that the modelling assumptions are correct. In 


practice, filters and smoothers are implemented using estimated parameters which 
can result in degraded performance. Parameter estimation algorithms have been 
previously investigated in [22] — [23] and it was found that the estimates of A and 


o. only approach the actual values when the measurement noise is negligible. 
Here, the attention is directed to problems where estimates of A, o are inexact 


and significant measurement noise of known variance is present. The problem of 
interest is to investigate whether a suitable WV can be designed to improve filter 
performance. 


5.7.3. Optimal Filter Solution 


Define AA” = GG"o~ + o-, in which the spectral factor A: R — R is causal, 
namely, A and its inverse, A”!, are bounded systems that proceed forward in time. 
Following the approach of [21], the optimal frequency-weighted smoother and 
filter are developed below. 


“Success is a lousy teacher. It seduces smart people into thinking they can’t lose.” Bill Gates 
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Theorem 1: For the above estimation problem, the filter solution 
H =aW'GG"N"} N'o%. (44) 


minimises 55", 


a = [oes 


bs where {.}+ denotes the causal part. 
The filter (44) can be written as 
WH =Wi(l-o3 {0%}, A"). >) 


5.7.4 Unweighted Filter Realisation 


A realisation for the filter (45) is set out below. The unweighted filtered output 
estimates >" € R are realised by 


Kea = (A—KC)x, + Kz,, Ky =0, (46) 
PO =(C-LC)x, + Lz,, (47) 


where L=PCQ" is the filter gain, K=APC'Q" is the predictor gain, 
Q=CPC’ +o° 


yp? 


equation P = APA’ — KQK' + BB'o%. 


in which P = P’ > 0 is the solution of the algebraic Riccati 


Denote the sequence of unweighted output estimates from (47) by jp = [jt? ..., 


pO]. It can be seen from (44) - (45) that frequency weighting can be applied 


independently of the filter designs and increases the order of the solutions. In the 
interest of minimising calculation cost, procedures for designing a first-order 


system, W"': IR — R, which produces frequency-weighted output estimates 
from $) are described in the next section. 


5.7.5 Frequency Weighting Method 


Suppose that the filter (46) — (47) designed with estimated A and o7 has 
produced >”. Let 


pM =z — HO, (48) 


“Wall street people learn nothing and forget everything.” Benjamin Graham 
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j = 1, denote the output estimation error exhibited by an unweighted filter. The 
approach taken herein is to assume that a moving-average-order-1 (MA1) system 


generates ~”) which can be identified and used to design a frequency weighting. 


In particular, assume that 


y = yw u, (49) 


where wu is a hypothetical stochastic input sequence and MW is an identified 


frequency weighting system. From the forms (44) and (45), an improved 
frequency-weighted estimate is obtained as 


pu) = we 3, (50) 
It will be demonstrated in a subsequent example that the above approach may be 
repeated with j = M iterations in (48) — (50). The combined W is given by W = 
Wi, VY, . 


5.7.6 Frequency Weighting Design 


In respect of assumption (49), suppose that j\/’ can be generated by an MAI 


system 
ei =u, + buy , (51) 


where bj, ux € R. The unknown J; and ux may be estimated from jV) by 
minimising an objective function F within the following procedure. 


Procedure 1 [25]: In respect of the MAI system (51), assume that an initial 


estimate 5 of b; is available and let “i” = 3,. Subsequent estimates 


(b, - bey i> 1, are calculated by repeating the following procedure. 
Step 1. Calculate estimates a” of uz, k € [2, N], from 
ae) = 5) — boa, .=0. (52) 
Then normalise the sequence a\'*” so that E{i7} = 1. 


Step 2. Obtain a new estimate of b by searching for 


"No man should escape our universities without knowing how little he knows." J. Robert Oppenheimer 
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ae Nd napintas (53) 
bY) =argmin F = (pf) BY ay’. 


BO e(-1,1) k=l 


The above iterations may be terminated when the difference between successive 
estimates is less than a prescribed tolerance. 


Lemma 8 [25]: The search problem in Step 2 is convex. 


. d°F = (i+ [ (i+1) (i+ 
Proof: The claim follows from ay (a))2 + 5(b} DAGDy + 
, z 


R(i+l)p A(i+l)\2 P (i+1) A (i+1)\2 
ABAD YP + (6, -—BO PUY? 


A binary search method is employed within the example that is presented 
subsequently. Lemma 8 indicates that the searches will converge to the minimum 
of F within a finite number of steps. It is observed below that a reducing sequence 


of output estimation error variances results in a sequence of b: that is 


nonincreasing. 


Lemma 9: Suppose that there exist output estimation error sequences ¥\! and 

pl such that EX(~Y? YP} < E(9Y}, then Procedure I produces b,, beg 
Le [2 p 2 

satisfying b;., < b;. 


sl oe erty 


Proof: It follows from (52) that Procedure 1 identify u, and b ), that satisfy PY = 


u, + biti, , which implies 
lim E(GY} = (+b? )E@}. (54) 


Since Efii?} = 1 within Step 1 of Procedure 1, the result follows from (54) and 


the stated condition. 


5.7.7 Frequency Weighted Output Estimation 


A procedure for frequency-weighted output estimation is described below. 


Procedure 2 [25]: Assume that there exist estimates A of A and 6. of o% which 


"All that was great in the past was ridiculed, condemned, combated, suppressed--only to emerge all the 
more powerfully, all the more triumphantly from the struggle." Nikola Tesla 
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are employed within the filter recursions (46) — (47) to produce unweighted output 
estimates fj”, j = 1, from the measurements z;. Frequency-weighted output 


estimates }/* may be calculated by carrying out the following steps. 


Step 1. From (53), namely, the difference between the measurement and the 
current output estimate, use Procedure | to identify b, ‘ 


Step 2. Realise scaled frequency-weighted output estimates (50) as 
HUY = (0.990 +b)? + BHD) | 
The above scaling ensures that the inverse frequency weighting transfer 
function’s gain approaches unity at zero frequency. The 0.99 factor 
within the above equation sets the frequency weighting system’s induced 
norm to less than unity which is needed in a lemma that follows. 


Step 3. Optionally go to Step 2 for iterations j > 1. 


The procedure may be terminated after successive estimates of y, have 
converged. 


It is observed below that the above procedure leads to -1 < b , < 0 that can result 
in performance benefits. Let |, and lL. denote the magnitude and oo-norm, 


respectively. 


Lemma 10 [25]: An identified -1 < b, < 0 implies (i) |x 
EGY} s EGY}. 


[9 and 
a y 2 


Proof: (i) From the definition of w within Step 2 of Procedure 2 and the stated 
condition, it follows that the z-domain frequency-weighting transfer function 
Wiz) = 0.991 +5, \z+b,)" is low pass and Iv, (zw; (2)|, = 


24) yy I2 
| <] 
2 [5 aie 


sup Wow; (2)| = 0.997 < J for which (50) implies 5°” 
ze(1,-1) 


r 2 : 
(ii) Since w and v are zero mean, ¥ is zero mean, (baa = NE{(P)} and 


the claim follows from (i). 


Remark 1: Lemmas 9 and 10 together imply that Procedure 2 produces a sequence 


“Whenever people agree with me I always feel I must be wrong.” Oscar Fingal O’Flahertie Wills 
Wilde 
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of b that is nonincreasing. That is, iterating Step 3 of Procedure 2 results in 


diminishing performance benefits. 


5.7.8 Frequency Weighting Application 


It suffices to demonstrate frequency weighting with a first-order plant. In 
principle, a similar approach could be applied with higher order plants such as in 


[22], [23]. Suppose that A e R, B= 1 and C = 1. An estimate A of A may be 
N-1 


found by minimising the objective function F = reap are —Ax,)° , see [23]. 
~ 4 k=l 
_ OF : : 
Setting PA = 0 yields the least-squares solution 


(55) 


A= pee do. y : 


2 


w? 


It follows from (1) as k > © that o? = A’o? +07, where G2 is the sample 


variance of x. Hence an estimate of 6° of o2 may be obtained from 
62 =(1- A’ )o? (56) 


Since C = 1, the measurements z; may be used in lieu of the unknown x, within 
(55) and (56). 


A simulation study was conducted in which measurements were generated using 
(1) — (3) with A = 0.9, 6° = 0.01, normally distributed realisations of w and v of 


length N = 100,000. Parameter estimates, A , 6. were obtained from (55) - (56) 
at SNRs from 0 to 5 dB. The observed root-mean-square error (RMSE) of the 
output estimates produced by Procedure 2 are shown in Fig. 2. It can be seen that a 
first-iteration frequency weighted filter (41) improves on an unweighted filter (i) 
and a second iteration of frequency weighting (ili) provides a_ further 
improvement, which illustrate Lemmas 9 - 10. 


It was found that repeated frequency weighting iterations provided diminishing 
performance improvement. For example, at 0 dB SNR, A=0.45, 6. = 0.04 were 
obtained from (55) - (56), and Procedure 2 yielded b =-0.06, -0.03 and -0.01 for 
the first, second and third iterations of frequency weighted filtering, respectively. 


“T think anybody who doesn't think I'm smart enough to handle the job is underestimating.” George 
Walker Bush 
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The corresponding magnitude spectra of the inverse frequency weighting transfer 
functions are shown in Fig. 3. The figure shows that inverse frequency weightings 
are high pass and approach a short circuit as 5? = 0, which consistent with the 
stated interpretation of Lemmas 9 - 10 in Remark 1. Fig. 2 also demonstrates that 
iterated frequency-weighting provides additional performance benefits which is 
consistent with Lemmas 9 - 10. 


0.18 1.15 
0.17 . 
. (i) 14 (i) | 
Ww 0.16 fT a 
= See ys 
© 0.415 Soe Ky | = 7 
Pee. Bye 1.05 (ii) 4 
0.14 (ii) Gi) See ig 
= (iii) 
, ‘ . ‘ = 4 1 
aD 1 2 3 4 5 10° 10" 10° 10' 
SNR, dB Frequency, radians/sec 


Fig. 2. RMSE for the Example: (i) unweighted 
filter; (ii) first iteration frequency-weighted filter; 
(ili) second iteration frequency weighted filter. 


Fig. 3. Spectra of aes) for the filtering 


example at 0 dB SNR: (i) 6 = -0.06; (ii) 6 =- 
0.03; and (iii) 6 = -0.01. 


5.8 Chapter Summary 


In the linear time-invariant case, it is assumed that the signals and observations 
can be described by x4+7 = Axx + Bw, ve = Cxx and Zz, = ve + ve, respectively, where 
the matrices A, B, C, O and R are constant. The Kalman filter for this problem is 
listed in Table 3. If the pair (A, C) is completely observable, the solution of the 
corresponding Riccati difference equation monotonically converges to the unique 
solution of the algebraic Riccati equation that appears in the table. 


The implementation cost is lower than for time-varying problems because the 
gains can be calculated before running the filter. If |A;(4)| < 1, i = / to n, and the 
pair (A, C) is completely observable, then |A,;(A — KC)| < 1, that is, the steady-state 
filter is asymptotically stable. The output estimator has the transfer function 


H,(z) = C(U-LC)(zl - A+ KC)'K+CL. 


Since the task of solving an algebraic Riccati equation is equivalent to spectral 
factorisation, the transfer functions of the minimum-mean-square error and steady- 
state minimum-variance solutions are the same. 


“Ideas are more powerful than guns. We would not let our enemies have guns, why should we let them 
have ideas.” Joseph Stalin 
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Optimal filters rely on the assumption that the model parameters and noise 
statistics are known, otherwise performance degradations can result. It is 
demonstrated above that filter estimation errors can be assumed to be generated by 
a first-order moving-average system. This assumed system can be identified and 
used to design a frequency weighting function to improve mean-square-error 
performance. The same remark applies to optimal smoothing. 


ASSUMPTIONS MAIN RESULTS 

Etwr} = Efvi} = 0. X,4, = Ax, + Bw, 
5 E{w,wi} = Q and E{y,v,} VsCN, 
2e = 
2 8 R are known. A, B and C are z=), +, 
aD known. 
2 Kain =(A-KC)X,),_, + Kz, 
2 Bp Vigne = (CH LO)S ait ey, 
gsé 
223 
Bs oO 

Q>0,R>0Oand CPC’ + Ry K = APC’ (CPC' +R)" 
3 > 0. The pair (4, C) is L=CPC'(CPC’ +R)" 
55 le. 
2 E observable P = APA —K(CPC" +R)K' + BOB" 
5% § 
£2 Ss 

Invertible frequency- Frequency-weighted filter 
= weighting system 
5 H=W(-02{A"}, 44) 
2 W = Wi, VEN, where 
ral 
g ; | 
c pun = w" jp , iteration j 
a 


Table 3. Main results for time-invariant output estimation. 


“New scientific ideas never spring from a communal body, however organized, but rather from the 
head of an individually inspired researcher who struggles with his problems in lonely thought and 
unites all his thought on one single point which is his whole world for the moment.” Max Karl Ernst 
Ludwig Planck 
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5.9 Problems 


Problem 1. Calculate the observability matrices and comment on the observability 
of the following pairs. 


1 2 - 1 -2 
@ a=|) “) er 4]. ci a| 4) e-k 4]. 


Problem 2. Generalise the proof of Lemma 1 (which addresses the unforced 
system x4+; = Ax; and ye = Cxx) for the system x4) = Axe + Bwx and ye = Cxx + 
Dwy. 


Problem 3. Consider the two Riccati difference equations 

Lee AR gaa a APC: (GRegaGe + Ry" Chaat + BOB" 
i AP.,A° ~ APC" (CP,C" +Ry" CP,A + BOB’ : 
Show that a Riccati difference equation for P,, =P, ,—P,, is given by 
ea AP Ay 7 AP gC" (CP,,C* +R, ie CPA, 
where 4, = A,, — 4,,P,,C’(CP,,C’ + R,,)'C,,, and R,, = CP,,C’ +R. 


t+k ttk t+k 


Problem 4. Suppose that measurements are generated by the single-input-single- 


output system xk+1 = ax, + we, ze=xe+ ve, where ae R, E{v,} =0, E{w,w,} 
= (l-a*)6, ; Evy} = Ops Eiw,y,} =0. 
(a) Find the predicted error variance. 


(b) Find the predictor gain. 
(c) Verify that the one-step-ahead minimum-variance predictor is realised by 


2 = a 2 4: avl-a’ ; 
k+/k k/k-1 ke 
1+vVl-a’ l+V1l-a@ 


(d) Find the filter gain. 
(e) Write down the realisation of the minimum-variance filter. 


Problem 5. Assuming that a system G has the realisation xx+1 = Ame + Brews, ye = 
Cixe + Dew, expand AAM(z) = GOG(z) + R to obtain A(z) and the optimal output 
estimation filter. 


“Thoughts, like fleas, jump from man to man. But they don’t bite everybody.” Baron Stanislaw Jerzy 
Lec 
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5.10 


Glossary 


In addition to the terms listed in Section 2.6, the notation has been used herein. 


A, B,C,D A linear time-invariant system is assumed to have the 


5.11 


[1] 
[2] 
[3] 


[4] 


[5] 


[6] 


[7] 
[8] 


realisation Xk+1 = Axg + Bwx and yp = Cxg + Dwx in which A, B, 
C, D are constant state space matrices of appropriate 
dimension. 

Time-invariant covariance matrices of stationary stochastic 
signals w; and vz, respectively. 

Observability matrix. 

Observability gramian. 

Steady-state error covariance matrix.. 

Time-invariant predictor gain matrix. 

Time-invariant filter gain matrix. 

Spectral factor. 

Transfer function matrix of output estimator. 

Transfer function matrix of input estimator. 

An invertible frequency-weighting system that can be applied 
to frequency weight the filtered estimate. 
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6. Continuous-time Smoothing 


6.1 Introduction 


The previously-described minimum-mean-square-error and minimum-variance 
filtering solutions operate on measurements up to the current time. If some 
processing delay can be tolerated then improved estimation performance can be 
realised through the use of smoothers. There are three state-space smoothing 
technique categories, namely, fixed-point, fixed-lag and fixed-interval smoothing. 
Fixed-point smoothing refers to estimating some linear combination of states at a 
previous instant in time. In the case of fixed-lag smoothing, a fixed time delay is 
assumed between the measurement and on-line estimation processes. Fixed- 
interval smoothing is for retrospective data analysis, where measurements 
recorded over an interval are used to obtain the improved estimates. Compared to 
filtering, smoothing has a higher implementation cost, as it has increased memory 
and calculation requirements. 


A large number of smoothing solutions have been reported since Wiener’s and 
Kalman’s development of the optimal filtering results — see the early surveys [1] — 
[2]. The minimum-variance fixed-point and fixed-lag smoother solutions are well 
known. Two fixed-interval smoother solutions, namely the maximum-likelihood 
smoother developed by Rauch, Tung and Striebel [3], and the two-filter Fraser- 
Potter formula [4], have been in widespread use since the 1960s. However, the 
minimum-variance fixed-interval smoother is not well known. This smoother is 
simply a time-varying state-space generalisation of the optimal Wiener solution. It 
differs from the Rauch-Tung-Striebel and Fraser-Potter solutions, which may not 
sit well with more orthodox practitioners. 


The main approaches for continuous-time fixed-point, fixed-lag and fixed-interval 
smoothing are canvassed here. It is assumed throughout that the underlying noise 
processes are zero mean and uncorrelated. Nonzero means and correlated 
processes can be handled using the approaches of Chapters 3 and 4. It is also 
assumed here that the noise statistics and state-space model parameters are known 
precisely. Note that techniques for estimating parameters and accommodating 
uncertainty are addressed subsequently. 


“Life has got a habit of not standing hitched. You got to ride it like you find it. You got to change with 
it. If a day goes by that don’t change some of your old notions for new ones,that is just about like 
trying to milk a dead cow.” Woodrow Wilson Guthrie 
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Some prerequisite concepts, namely time-varying adjoint systems, backwards 
differential equations, Riccati equation comparison and the continuous-time 
maximum-likelihood method are covered in Section 6.2. Section 6.3 outlines a 
derivation of the fixed-point smoother by Meditch [5]. The fixed-lag smoother 
reported by Sage et al [6] and Moore [7], is the subject of Section 4. Section 5 
deals with the Rauch-Tung-Striebel [3], Fraser-Potter [4] and minimum-variance 
fixed-interval smoother solutions [8] - [10]. As before, the approach here is to 
accompany the developments, where appropriate, with proofs about performance 
being attained. Smoothing is not a panacea for all ills. If the measurement noise is 
negligible then smoothing (and filtering) may be superfluous. Conversely, if 
measurement noise obliterates the signals then data recovery may not be possible. 
Therefore, estimator performance is often discussed in terms of the prevailing 
signal-to-noise ratio. 


6.2 Prerequisites 
6.2.1 Time-varying Adjoint Systems 


Since fixed-interval smoothers employ backward processes, it is pertinent to 
introduce the adjoint of a time-varying continuous-time system. Let G denote a 
linear time-varying system 


X(t) = A(t) x(t) + BO) w(Z), (1) 
y(t) = C(t)x(t) + D(t)w(t), (2) 


operating on the interval [0, 7]. Let w denote the set of w(¢) over all time ¢, that is, 
w= {w(t), t € [0, 7]}. Similarly, let y= G w denote {y(A), t € [0, T]}. The adjoint 


of G , denoted by Z”, is the unique linear system satisfying 
<y, G we =< G" y, w> (3) 

for ally ¢ R’ andwe R?’. 
Lemma 1: The adjoint G" of the system G described by (1) — (2), with x(t) = 0, 
having the realisation 

CW) =A (NE ()-C? (Hu), (4) 

(1) = BS ()+ D" (Qu), ©) 
with ¢(T) =0, satisfies (3). 


“The simple faith in progress is not a conviction belonging to strength, but one belonging to 
acquiescence and hence to weakness.” Norbert Wiener 
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The proof follows mutatis mutandis from that of Lemma | of Chapter 3 and is set 
out in [11]. The original system (1) — (2) needs to be integrated forwards in time, 
whereas the adjoint system (4) — (5) needs to be integrated backwards in time. 
Some important properties of backward systems are discussed in the next section. 
The simplification D(t) = 0 is assumed below unless stated otherwise. 


6.2.2 Backwards Differential Equations 


The adjoint state evolution (4) is rewritten as 


-E() = AE) FC utd). (6) 


The negative sign of the derivative within (6) indicates that this differential 
equation proceeds backwards in time. The corresponding state transition matrix is 
defined below. 


Lemma 2: The differential equation (6) has the solution 


E(t) =O" (t,t, )S (ty) - [ ©" (s,t)C"(s)u(s)ds , (7) 


where the adjoint state transition matrix, ®" (t,t,), satisfies 


=—-A'(t1)" (t,1,), (8) 


d0" (t,t,) 
©" (t,t,) =——+ 
(t,t) Ht 


with boundary condition 


o" (t,t) =1 (9) 


Proof: Following the proof of Lemma I of Chapter 3, by differentiating (7) and 
substituting (4) — (5), it is easily verified that (7) is a solution of (6). 


The Lyapunov equation corresponding to (6) is described next because it is 

required in the development of backwards Riccati equations. 

Lemma 3: In respect of the backwards differential equation (6), assume that u(t) 

is a zero-mean white process with Ef{u(t)u'(t)} = U(t)6(t — 1) that is uncorrelated 

with €(t,), namely, Efu(t)é"(t,)} = 0. Then the covariances P(t, 1) = 
ESE(t)E’ (t 

E(GE™(} and PQ,r) = SBOE OF 


equation 


satisfy the Lyapunov differential 


“Progress always involves risk; you can’t steal second base and keep your foot on first base.” 
Frederick James Wilcox 
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—P(t,t) = A(t)’ P(t,t) + P(t,r) A(t) —C’ (thU(t)C(2) . (10) 
Proof: The backwards Lyapunov differential equation (10) can be obtained by 
using (6) and (7) within dE Oh = E{EQE™(t) + E(X)E™(kK)} (see the 


proof of Lemma 2 in Chapter 3). 


6.2.3 Comparison of Riccati Equations 
The following Riccati Equation comparison Theorem is required subsequently to 
compare the performance of filters and smoothers. 
Theorem I (Riccati Equation Comparison Theorem) [12], [8]: Let P1(t) = 0 and 
P(t) = 0 denote solutions of the Riccati differential equations 
PO=AOPR(O+ ROA O-POS OR] 
+B, (1)Q,(t)B) (0) + BOMB" (0) (11) 


and 
P(t) = 4(OP,0+P,O4 (O-P (OS, (OPO 
+B, (t)Q, (t)B; (t) + BOOB" (0) (12) 


with S(t) = C/ (OR, (OC (0), S20 = Ci(OR;'OC,(0), where Ai(d), Bilt), Cid, 
O.(t) = 0, Ri(t) = 0, A2(d), Bo(t), CoA), O2(t) = O and R2(f) = 0 are of appropriate 
dimensions. If 


(i) Pi(to) = P2(to) for a to = 0 and 


fOW 40]_[A0 40) 
e fe a ab (0) a eee 
Then 
P,(t) > P(t) (13) 


for all t = to. 


Proof: Condition (i) of the Theorem is the initial step of an induction argument. 
For the induction step, denote P,(t) = P(t) = P(t), P3(t) = P(t) — P(t) and 
A(t) = -A’ (1) + SOB(D) — 0.58,OP,(0). Then 


“The price of doing the same old thing is far higher than the price of change.” William Jefferson (Bill) 
Clinton 
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PB) = AOPR,O+ ROA’ (1) 
iP, [20 Sale A, (1) II i 
4 (t) 1) 4,(¢) —S;(t) PO 
which together with condition (ii) yields 


P(t)> AOR (H+ ROA. (14) 


Lemma. 5 of Chapter 3 and (14) imply P(t) > 0 and the claim (13) follows. 


6.2.4 The Maximum-Likelihood Method 


Rauch, Tung and Streibel famously derived their fixed-interval smoother [3] using 
a maximum-likelihood technique which is outlined as follows. Let x() ~ MV“ (u, 
Rx) denote a continuous random variable having a Gaussian (or normal) 
distribution within mean E{x(t)} = uw and covariance Ef(x(#) — 4)(x(2) — w)"} = Ree 


The continuous-time Gaussian probability density function of x(f) € R" is 
defined by 


P(x(t)) = 


exp{-0.5(x()— W)" R(x) - (15) 


1 
(27)"?|R,, | 


in which |R,,| denotes the determinant of R,.. The probability that the continuous 
random variable x(¢) with a given probability density function p(x(f)) lies within an 
interval [a, b] is given by the likelihood function (which is also known as the 
cumulative distribution function) 


Plas x(t) <b) =f p(x(t))dr. (16) 


The Gaussian likelihood function for x(f) is calculated from (15) and (16) as 


f(x) = “exp{-0.5- 4) R@O-wfde. 7) 


1 
(Q2n)"? IR.” J 
It is often more convenient to work with the log-probability density function 


1/2 


log p(x(t))=—log (27)"” = 0.5(x(1) = 1)! Re (r= wd (18) 


and the log-likelihood function 


en 


“Faced with the choice between changing one’s mind and proving that there is no need to do so, almost 
everyone gets busy on the proof.” John Kenneth Galbraith 
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1/2 


log f (x(t) =—log (27)"" 


Re, 


—0.5[ (x()— Ww)" Rx wd. (19) 


Suppose that a given record of x(t) is assumed to be belong to a Gaussian 
distribution that is a function of an unknown quantity @. A statistical approach for 
estimating the unknown @ is the method of maximum likelihood. This typically 


involves finding an estimate 6 that either maximises the log-probability density 
function 


6=arg max log p(O|x(t)) (20) 
0 
or maximises the log-likelihood function 
O=arg max log f(O|x(t)). (21) 
0 


So-called maximum likelihood estimates can be found by setting either 
Clog px) | log £1 x@) 
00 00 
Continuous-time maximum likelihood estimation is illustrated by the two 

examples that follow. 


to zero and solving for the unknown 0. 


Example 1. Consider the first-order autoregressive system 
X(t) =—a,x(t)+ w(t), (22) 


dx(t) 


where x(t) = are w(f) is a zero-mean Gaussian process and ao is unknown. It 
t 


follows from (22) that x(t) ~ A’ (-a,x(t), 02), namely, 


1 
(22)""o,, 


f(X(0) = [, exp{-0.5G@) +a,x@Qyo,?} de. (23) 


Taking the logarithm of both sides gives 


log f(x(t)) =—log (22)""o,, -0.50 [ (x(t)+a,x(t))’ dt. (24) 


= 0 results in [ (x(t) + a,x(t))x(t)dt = 0 and hence 


Seng 98 P40) 
a 


“When a distinguished but elderly scientist states that something is possible, he is almost certainly 
right. When he states that something is impossible, he is probably wrong.” Arthur Charles Clarke 
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-1 
fe -( JG @at) [/ «ox@ae. (25) 
Example 2. Consider the third-order autoregressive system 
X(t) + a, X(t) +a, x(t) + a,x(t) = w(t) (26) 
3 2 
where x(t) = exo) and x(t) = cee) The above system can be written in a 


dt’ dt 


controllable canonical form as 


%,()| [-a, -a, -a, |fx@)] | wa) 
%@Ol=| 1 0 0 fx@I+] 0 |. (27) 
x(t) 0 1 O | x,() 0 


Assuming x,(t) ~A/(—a,x,(t)—4,x,(t)-a,x,(t), 02), taking logarithms, setting 
to zero the partial derivatives with respect to the unknown coefficients, and 
rearranging yields 


1 


5 T T yi pass 
\, x;dt \, X,X,dt [ X,x,dt ip X,x,dt 


a 
2 T Fos, T P 
a, |=- ie X,X,dt I, x;dt ih x,x,dt \, X,x,dt |, 


a, 


(28) 


T T Poy pe 
f, x,x,dt \, X,x,dt \, x, dt i X,x,dt 


in which state time dependence is omitted for brevity. 


6.3 Fixed-Point Smoothing 


6.3.1 Problem Definition 


In continuous-time fixed-point smoothing, it is desired to calculate state estimates 
at one particular time of interest, tr, 0 < zt < ¢, from measurements z(¢) over the 
interval t € [0, 7]. For example, suppose that a continuous measurement stream of 
a tennis ball’s trajectory is available and it is desired to determine whether it 
bounced within the court boundary. In this case, a fixed-point smoother could be 
employed to estimate the ball position at the time of the bounce from the past and 
future measurements. 


“Don’t be afraid to take a big step if one is indicated. You can’t cross a chasm in two small jumps.” 
David Lloyd George 
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A solution for the continuous-time fixed-point smoothing problem can be 
developed from first principles, for example, see [5] - [6]. However, it is 
recognised in [13] that a simpler solution derivation follows by transforming the 
smoothing problem into a filtering problem that possesses an augmented state. 
Following the nomenclature of [14], consider an augmented state vector having 


x(t) 
c(t) 
state of the system x(t) = A(Hx() + Bw and y() = C(Ax(4. The second 
component, é(t) € R”, equals x(t) at time ¢ = 7, that is, €(t) = x(z). The 
corresponding signal model may be written as 


two components, namely, x(t) = . The first component, x(t) ¢ R", is the 


X(t) = AO (x (+ BOOwO) (29) 
2(t)=CO Mx O+v(0), (30) 
A(t 0 Bet 
where A = @ , BOL) = © and C(#) = [C(t) 0], in which 
0 6,4) 6,,B() 
1 if t=c, : Las é 
6, = 0964 is the Kronecker delta function. Note that the simplifications 
if t47 


A(t) 0 Bet 
AQ= : ) 5 and B(f) = : J arise for t > t. The smoothing objective is to 


produce an estimate E(t) of €(t) from the measurements z(f) over ¢ € [0, 7]. 


6.3.2 Solution Derivation 


Employing the Kalman-Bucy filter recursions for the system (29) — (30) results in 


$1) = A (DRO + K(@)(z@ ~C (RE | t)) 
=(4°O-KO OCW) O+ KOO), G1) 
where 


KOO) = POMC’) OR"), (32) 


in which P(t) € R*”*” is to be found. Consider the partitioning K“ (ft) = 


i (t) 


, then for ¢ > 7, (31) may be written as 
KO 


“If you don’t like change, you’re going to like irrelevance even less.” General Eric Shinseki 
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i | q : fae ~K(t)C(t) ae | ‘| : Bato | (33) 
E(t) -KOCOH Of, o@) K(t) 
Define the augmented error state as ¥(t) = x(t) — (0), that is, 
iE | a _ Baie | q | (34) 
¢(t) c(t) E(t) 


Differentiating (34) and using z(f) = C(¢)x(¢|t) + v(2) gives 


| ‘| _ ‘ | 1% | ” 
E(t) 0 E(t) 
7 ee —K(t)C(t) a les | ul a Slee (35) 
-K()C() Of, EO) 0 -KW@) Lv] 
P(t) X(t) 
X(t) QZ) 
QW) = ELEC) — EMMEO- EHV} and UH) = ELE - EOI - 


R(t|t)|"}. Applying Lemma 2 of Chapter 3 to (35) yields the Lyapunov 
differential equation 


Denote P(t) = [ where P(t) = Ef {x(t) — £(¢| [x — X(t O]'3, 


P(t) X'(t) ae i P(t) =" (t) 
x) AD] | -KOCO  OfLE™ ADH 


. P(t) X(t) |] A-C'OK'() -C'(K'() 
X(t) Q(t) 0 0 


0 ale 0 ] BY) 0 | 
0 -K(@JL 0 R@IL-K') -K'O 
Simplifying the above differential equation yields 
P(t) = A(t)P(t)+ P(A’ (t1)-— PIC’ (NR (ONC(OP()+ BNOWB'(t), (G6) 
X(t) = X()(A"(Q-C™OK" (9), (37) 
Q(t) = -D(t)C? (HR (1)C(t)D" (2). (38) 


Equations (37) — (38) can be initialised with 


“The great virtue of my radicalism lies in the fact that I am perfectly ready, if necessary, to be radical 
on the conservative side” Theodore Roosevelt 
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X(r) = P(r). (39) 


Thus, the fixed-point smoother estimate is given by 
E() = Z(NC’ OR" O(a) - CORED), (40) 


which is initialised with E(r) = X(t). Alternative derivations of (40) are 


presented in [5], [8], [15]. The smoother (40) and its associated error covariances 
(36) — (38) are also discussed in [16], [17]. 


6.3.3. Performance 


It can be seen that the right-hand-side of the smoother error covariance (38) is 
non-positive and therefore Q(t) must be monotonically decreasing. That is, the 
smoothed estimates improve with time. However, since the right-hand-side of (36) 
varies inversely with R(‘), the improvement reduces with decreasing signal-to- 
noise ratio. It is shown below the fixed-point smoother improves on the 
performance of the minimum-variance filter. 


Lemma 4: In respect of the fixed-point smoother (40), 
P(t) = Q(t). (41) 


Proof: The initialisation (39) accords with condition (i) of Theorem 1. Condition 
(ii) of the theorem is satisfied since 


be A(t) le -C’()R"(HC()0 0 
A’(t) -C'(t)R'(HC(t) | ~ 0 0 


and hence the claim (41) follows. 


6.4 Fixed-Lag Smoothing 


6.4.1 Problem Definition 


For continuous-time estimation problems, as usual, it assumed that the 
observations are modelled by x(t) = A(t)x() + BWA, 2H = C(Ax(A) + vO), with 
E{w(t)w' (r)} = O(d(t-r) and Ef{v(t)v'(c)} = R(d(t—-7). In fixed-lag 
smoothing, it is desired to calculate state estimates at a fixed time lag behind the 


“Change will not come if we wait for some other person or some other time. We are the ones we’ve 
been waiting for. We are the change that we seek.” Barack Hussein Obama II 
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current measurements. That is, smoothed state estimates, x(¢|t+7), are desired at 


time f¢, given data at time ¢ + t, where 7 is a prescribed lag. In particular, fixed-lag 
smoother estimates are sought which minimise F{[x(t) — x(¢t|t+7)][x@) — 


R(t|t+r)]'}. It is found in [18] that the smoother yields practically all the 


improvement over the minimum-variance filter when the smoothing lag equals 
several time constants associated with the minimum-variance filter for the 
problem. 


6.4.2 Solution Derivation 


Previously, augmented signal models together with the application of the standard 
Kalman filter recursions were used to obtain the smoother results. However, as 
noted in [19], it is difficult to derive the optimal continuous-time fixed-lag 
smoother in this way because an ideal delay operator cannot easily be included 
within an asymptotically stable state-space system. Consequently, an alternate 
derivation based on that in [6] is outlined in the following. Recall that the gain of 


the minimum-variance filter is calculated as K(t) = P(t)C’(t)R'(t), where P(t) is 
the solution of the Riccati equation (36). Let ®(z,t) denote the transition matrix 
of the filter error system x(t|t) = (A(t) — K(t)C(t))X(t|t) + B(t)w(t) - 
K(t)v(t), that is, 


O(t,s) =( A(z) -K(r)C(z)) OC, 5) (42) 


and (s,s) = J. It is assumed in [6], [17], [18], [20] that a smoothed estimate 
X(t|t+7) of x(f) is obtained as 


X(t | t+7) =xX(1)+ PHE(t,7), (43) 

where 
E(t,t+7) = [oe ACT (2)R" (2) (z(t) —C(e)X(e | 7). (44) 
The formula (43) appears in the development of fixed interval smoothers [21] - 


[22], in which case &(f) is often called an adjoint variable. From the use of 
Leibniz’ rule, that is, 


d pro = db(t) da(t) px d 
dtv f(t s)ds = FEE) f(t,a(t)) a +] ng fess : 


“Change is like putting lipstick on a bulldog. The bulldog’s appearance hasn’t improved, but now it’s 
really angry.” Rosbeth Moss Kanter 
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it can be found that 


C(t,t+ 7) =@" (t+ 7)C" (t+ 7)R (+7) (z(t +7) — C(t +7) K(t +7) |t +7) 


(45) 
—C'(A)R '(t)(z(t)- C(O) X(t | 1) -(AM- K(t)C(t))” E(t,t +7). 
Differentiating (43) with respect to ¢ gives 
R(t |t+7) = X(t | 1) + PIDE(t, 7) + PIDE(t, 7). (46) 


Substituting &(t,r) = P''(t)(X(t|t+7)—X(t|t)) and expressions for x(t), P(t), 


&(t,t+7) into (43) yields the fixed—lag smoother differential equation 


A(t |t+7) = A*A(t|t+7) + BOOWB’ (NP (0) (KE |t +7) - X(t |) 
+P(t)}@" (t+7,1)C' (t+7)R" (t+7)(z(t+7)-—Ct+2)X(t +7 |t+7)). (47) 


6.4.3 Performance 
Lemma 5 [18]: 


P(t) — E{[x(t) — X(t |t+r)|[x(t) — &(¢|t+r))"} > 0. (48) 
Proof. It is argued from the references of [18] for the fixed-lag smoothed estimate 
that 
E{[x()—A(¢| t+ 7) 40+ 07} 
= P() PW) O"(s,)CT()Rs)C(s)(s,)ds PC): (49) 
Thus, (48) follows by inspection of (49). 


That is to say, the minimum-variance filter error covariance is greater than fixed- 
lag smoother error covariance. It is also argued in [18] that (48) implies the error 
covariance decreases monotonically with the smoother lag t. 


“An important scientific innovation rarely makes its way by gradually winning over and converting its 
opponents: What does happen is that the opponents gradually die out.” Max Karl Ernst Ludwig Planck 
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6.5 Fixed-Interval Smoothing 


6.5.1 Problem Definition 


Many data analyses occur off-line. In medical diagnosis for example, reviews of 
ultra-sound or CAT scan images are delayed after the time of measurement. In 
principle, smoothing could be employed instead of filtering for improving the 
quality of an image sequence. 


Fixed-lag smoothers are elegant — they can provide a small performance 
improvement over filters at moderate increase in implementation cost. The best 
performance arises when the lag is sufficiently large, at the expense of increased 
complexity. Thus, the designer needs to trade off performance, calculation cost 
and delay. 


Fixed-interval smoothers are a brute-force solution for estimation problems. They 
provide improved performance without having to fine tune a smoothing lag, at the 
cost of approximately twice the filter calculation complexity. Fixed interval 
smoothers involve two passes. Typically, a forward process operates on the 
measurements. Then a backward system operates on the results of the forward 
process. 


The plants are again assumed to have state-space realisations of the form x(t) = 
A(t)x() + B)w() and y(t) = C(Ax(t) + D(Aw(t). Smoothers are considered which 
operate on measurements z(t) = y(t) + v(¢) over a fixed interval ¢ € [0, 7]. The 
performance criteria depend on the quantity being estimated, viz., 


e in input estimation, the objective is to calculate a w(t|7) that minimises 
E{lw) — wel Tw) — Wel TT}; 

e in state estimation, x(¢|7) is calculated which achieves the minimum 
EX[x() — X(t] T)x@) — 2C|T)]"} 5 and 

e in output estimation, j(t|7) is produced such that E{[y(t) — 
Pt |T) [vO — H(¢| T)]'} is minimised. 


This section focuses on three continuous-time fixed-interval smoother 
formulations; the maximum-likelihood smoother derived by Rauch, Tung and 
Streibel [3], the Fraser-Potter smoother [4] and a generalisation of Wiener’s 
optimal unrealisable solution [8] — [10]. Some additional historical background to 
[3] — [4] is described within [1], [2], [17]. 


“The soft-minded man always fears change. He feels security in the status quo, and he has an almost 
morbid fear of the new. For him, the greatest pain is the pain of a new idea.” Martin Luther King Jr. 
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6.5.2 The Maximum Likelihood Smoother 
6.5.2.1 Solution Derivation 


Rauch, Tung and Streibel [3] employed the maximum-likelihood method to 
develop a discrete-time smoother for state estimation and then used a limiting 
argument to obtain a continuous-time version. A brief outline of this derivation is 
set out here. Suppose that a record of filtered estimates, x(7|7), is available over 


a fixed interval rt € [0, 7]. Let x(z| 7) denote smoothed state estimates at time 0 < 
t <T to be evolved backwards in time from filtered states x(7 | 7) . The smoother 


development is based on two assumptions. First, it is assumed that ~i(r |T) is 
normally distributed with mean A(z)X(z|T) and covariance B(t)O(t)B"(t), that 
is, ~(r |T) ~ M(A(t)X(t|T), B(t)O(2)B"(2). The probability density function 
of —%(r|T) is 


1 
(277)"? |B()O()B" (x)| 
xexp{-0.5(—8(c | 7)- Aa) 8(c |)" (BO)O@)B" (2) (-A(t | T)- ArT} 


p(-*(c |T)| 8 |T)) = 


Second, it is assumed that x(7|7) is normally distributed with mean x(z|7) and 
covariance P(t), namely, x(z|T) ~ N(xX(z|7z), P(c)). The corresponding 
probability density function is 

1 


p(&(c|T)| ¥(¢ | 2) =—— 5 x exp{-0.5(8(¢| 7) - &(¢ | 2)" P1(Ht| 7) - He | 2}. 
(27)"?|PO)| 


From the approach of [3] and the further details in [6], 


_ dlog p(-¥*(c|T)| Hr |T)) pCR |T)| 812) 
aR(r | T) 
_ dlog p(-*(c |T)| %(r|T)) , Clog PACT) IXC|9) 
aR(r | T) aR(t | T) 


0 


results in 


“If you want to truly understand something, try to change it.” Kurt Lewin 
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_ O(-¥(¢ | T) - A(a) H(z | TY 
aX(r|T) 
x(B(t)Q(t)B" (2) (-X(z | T) — A(z) (7 | T)) +P (7(RO|T) - Re | 7))- 


0 


Hence, the solution is given by 
~(r |T) = A(r)x(t|T)+ G(r)(X(z |T)- x(t | t)) ; (50) 


where 


a(-X(c |T) - A) TY" 


1 (51) 
@8(r|T) Pe) 


G(r) = —B(r)Q(r)B" (zr) 


is the smoother gain. Suppose that <(r| 7) , A(t), B(t), O(t), P'(t) are sampled at 

integer k multiples of 7; and are constant during the sampling interval. Using the 

A(kK DT, |T)- (KT, |T) 
T. 


S 


Euler approximation —S(KT, |T) = , the sampled gain 


may be written as 


G(kT,) = BURT, )T,'Q(KT, )B" (kT, (I + AT, )P'(kT,). (92) 


Recognising that 7-'O(kT.) = Q(t), see [23], and taking the limit as 7, > 0 and 
yields 


G(r) = B(t)O(t)B" (t)P'(r). (53) 


To summarise, the above fixed-interval smoother is realised by the following two- 
pass procedure. 

(i) In the first pass, the (forward) Kalman-Bucy filter operates on 
measurements z(t) to obtain state estimates X(z| 7) . 

(ii) In the second pass, the differential equation (50) operates on the filtered 
state estimates x(t|7) to obtain smoothed state estimates xX(7|T). 
Equation (50) is integrated backwards in time from the initial condition 
X(t |T) = xX(r| 7) atcr=T. 


Alternative derivations of this smoother appear in [6], [20], [23], [24]. 


“There is a certain relief in change, even though it be from bad to worse. As I have often found in 
travelling in a stagecoach that it is often a comfort to shift one’s position, and be bruised in a new 
place.” Washington Irving 
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6.5.2.2 Alternative Form 


For the purpose of developing an alternate form of the above smoother found in 
the literature, consider a fictitious forward version of (50), namely, 


X(t|T) = A()8(t|T) + BEOOWB" OP '(O(&(¢| T)-2(¢|) 
= A(t)X(t|T) + BOOMB' (OE(t|T) , (54) 
where 
&(t|T) =P (XE | T)- X(t |0)) (55) 


is an auxiliary variable. An expression for the evolution of €(¢|T) is now 
developed. Writing (55) as 


M(t |T) = x(t |1)+ POS(E|T) (56) 
and taking the time differential results in 
X(t |T) = X(t | + POEC|T)+ POE(E|T). (57) 
Substituting %(¢|) = A(DRC|D + PICT(OR (z(t) — C(R(t|d) into (57) 
yields 
P(t)E(t| T) = P(t)C’ (t)R''C(t) — P(t)C' (t)R z(t) 
+A(t)P(t)E(t) + BONOW)B’ (t) — POE(O). (58) 
Using X(¢|t) = XC¢|T) - PMEEIT), -POAO = AOPO - 
P(t)CT()R1QCHPMEt|T) + BOMB) — P(t) within (58) and 
rearranging gives 
—E(t|T) =-C™ OR" (QCORE|T) + AT (OE(-— COR" O20). (5?) 
The filter (54) and smoother (57) may be collected together as 
ae | A(t) ee al 0 | 


-&(t|T)| [-C"OR' OCH) A(t) S(t\T) | LC’OR Oz 
(60) 


“Discontent is the first step in the progress of a man or a nation.” Oscar Fingal O’Flahertie Wills 
Wilde 
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Equation (60) is known as the Hamiltonian form of the Rauch-Tung-Striebel 
smoother [17]. 


6.5.2.3 Performance 


In order to develop an expression for the smoothed error state, consider the 
backwards signal model 


—X(r) = A(r)x(r) + B(t)w(z). (61) 
Subtracting (50) from (61) results in 
—¥(t) + ¥(t |T) = (A(z) + G(r) (x(z) - 8 | T)) 
—G(t)(x(rt) — X(t | 7)) + B(r)w(t) . (62) 


Let x(z|T) = x(t) — x(t|T) denote the smoothed error state and X(7|T) = 
x(t) — X(t|7) denote the filtered error state. Then the differential equation (62) 
can simply be written as 


—x(r | T) = (A(t) + G(r) (X(c | T) — G(x) X(z |r) + B(r)w(z) , (63) 


where —X(r|T) = -(X(c|T) — (c)). Applying Lemma 3 to (63) and using 
E{X(t |r), w'(c)} =0 gives 

-2(r|T) = (A(t) + GEC | T) + L(e | TM A(e) + G(r)" — BO) OC) B" (x), (4) 
where X(r|T) = E{X(r|T), %7(r|T)} is the smoother error covariance and 


dX(t|T) 
dt 


Y(c|T) = . The smoother error covariance differential equation (64) is 


solved backwards in time from the initial condition 
X(r|T) = P(t |p) (65) 
at t=T, where P(t|t) is the solution of the Riccati differential equation 
P(t) =(AM—-KOMCO)PO+ POA" (H)- C7 (HK (0) 


+K(t)R(t)K’ (t)+ B(QO()B’ (0) 
= A(t) P(t) + P()A™ (t)- K()R()K" (t)+ BMO()B" (t). (66) 


“That which comes into the world to disturb nothing deserves neither respect nor patience.” Rene Char 
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It is shown below that this smoother outperforms the minimum-variance filter. For 
the purpose of comparing the solutions of forward Riccati equations, consider a 
fictitious forward version of (64), namely, 


X(¢|T) = (A(t) + GO)ZC|T) + U(t| TWA) + GO)" - BOQOB' (67) 
initialised with 
De, (TIS PE (4S 0: (68) 
Lemma 6: In respect of the fixed-interval smoother (50), 
P(t|t)>X(t|T). (69) 


Proof: The initialisation (68) satisfies condition (i) of Theorem 1. Condition (ii) of 
the theorem is met since 


> 


B)O(t)B' (t)+ K()RO)K'(t) A(t)-K as 


—B(t)O(t)B'(t) A(t) +G(t) 
A’ (t)-C’ (t)K' (0) 0 


A’ (t)+G'(t) 0 
for all t = to and hence the claim (69) follows. 


6.5.3. The Fraser-Potter Smoother 


The Central Limit Theorem states that the mean of a sufficiently large sample of 
independent identically distributed random variables will be approximately 
normally distributed [25]. The same is true of partial sums of random variables. 
The Central Limit Theorem is illustrated by the first part of the following lemma. 
A useful generalisation appears in the second part of the lemma. 


Lemma 7: Suppose that y1, y2, ..., Yn are independent random variables and W1, 
W2, ... W, are independent positive definite weighting matrices. Let pu = Efy}, u = 
yityot... + yn and 


v= (Wii + Woy2 + «0 + Win) (Wi + Wr + 0. Wi)! (70) 


(i) Ifyi~ M(u, R), i= 1 ton, thenu~ M (nu, nR); 
(ii) Ifyi~ M(0, D, i= 1 ton, thenv~ NM (0, J). 


Proof: 


() Efu} = Efyi} + Ey) +... + Eb = ne. E(u — Wu w= Efi 
W001 WT} + Ef. — W02— WP + + EfOn — On ~ WP = AR. 


“Today every invention is received with a cry of triumph which soon turns into a cry of fear.” Bertolt 
Brecht 
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(ii) Efvy} = Wi(W1 + Wr + ... + Wi) Efi? + WoW + Wr + 2. + Wy) IE(2) 
+... + Wi(Wi + Wr + ... Wi) Efyn}) = 0. Efv"} = E(w + Wy Whe 
+ WY W YM, + W, +. Wy} + EWE + WE + + 
WIY WE yy WW, + Wy +o Wy} + + EE + WE + + 
WY WIM + W, +. Wy y = WE + Wy 4 


W')'(WIW, + WIW, +... WWW, + W, +... WY =. 
n 1 1 2 2 n n 1 2 n 


Fraser and Potter reported a smoother in 1969 [4] that combined state estimates 
from forward and backward filters using a formula similar to (70) truncated at n = 
2. The inverses of the forward and backward error covariances, which are 
indicative of the quality of the respective estimates, were used as weighting 
matrices. The combined filter and Fraser-Potter smoother equations are 


X(t |) = A(DK(t | 1) + P| ONC" (OR (N(2ZO-CMAE|0) . (71) 


—§(t|1) = AMEC|N+EC|NCT OR" O(ZO-COE(E|D), (72) 
R(T) =(PUEN+Z (ED) (PT EDK |D+Z(E1DEE1D), (73) 


where P(t|t) is the solution of the forward Riccati equation P(t|t) = 
A(t)P(t|t) + P(t|HAT) — PU|DNCTOR'OCMHP(E|1) + BOMB" (t) and 
X(t|f) is the solution of the backward Riccati equation —X(t|t) = A(HX(t|t) + 
Lt| DA (1) — UC|NCTOR TOCMOZE|H) + BOOWB' (1). 


It can be seen from (72) that the backward state estimates, ¢(¢), are obtained by 
simply running a Kalman filter over the time-reversed measurements. Fraser and 
Potter’s approach is pragmatic: when the data is noisy, a linear combination of two 
filtered estimates is likely to be better than one filter alone. However, this two- 
filter approach to smoothing is ad hoc and is not a minimum-mean-square-error 
design. 


6.5.4 The Minimum-Variance Smoother 


6.5.4.1 Problem Definition 


The previously described smoothers are focussed on state estimation. A different 
signal estimation problem shown in Fig. 1 is considered here. Suppose that 
observations z = y2 + v are available, where y2 = G, w is the output of a linear time- 


varying system and v is measurement noise. A solution #H is desired which 


“If there is dissatisfaction with the status quo, good. If there is ferment, so much the better. If there is 
restlessness, I am pleased. Then let there be ideas, and hard thought, and hard work.” Hubert Horatio 
Humphrey. 
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produces estimates y, of a second reference system y; = G, w in such a way to 
meet a performance objective. Let » =y1— , denote the output estimation error. 
The optimum minimum-variance filter can be obtained by finding the solution that 
minimises ! iy .- Here, in the case of smoothing, the performance objective is to 


minimise | py | 


a 


Fig. 1. The general estimation problem. The objective is to produce estimates ), 
of y; from measurements z. 


6.5.4.2 Optimal Unrealisable Solutions 


The minimum-variance smoother is a more recent innovation [8] - [10] and arises 
by generalising Wiener’s optimal noncausal solution for the above time-varying 
problem. The solution is obtained using the same completing-the-squares 
technique that was previously employed in the frequency domain (see Chapters 1 
and 2). It can be seen from Fig. | that the output estimation error is generated by 
y =#,,i, where 


R,=-[H HG,-G| ed 


v 
is a linear system that operates on the inputs i = : 
w 


Consider the factorisation 


AA" =G,0G," +R, (75) 


“Restlessness and discontent are the first necessities of progress. Show me a thoroughly satisfied man 
and I will show you a failure.” Thomas Alva Edison 
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in which the time-dependence of Q(f) and R(A) is omitted for notational brevity. 
Suppose that A is causal, namely A and its inverse, A"', are bounded systems that 
proceed forward in time. The system A is known as a Wiener-Hopf factor. 


Lemma 8: Assume that the Wiener-Hopf factor inverse, A', exists over t € [0, T]. 
Then the smoother solution 


H =GOGiKN"K" 


= GOG,' AA") 
= GOOG; (G,0G," + Ry" : (76) 


ie 


oe oi ace a~H A 
minimises 5 |, = |. 


Proof: It follows from (74) that RR; = GOG, — GOGSH" — HG,OG! 
+ HAA"H" . Completing the square leads to RR, = RR: + RR 


where 
RF, =(HA-GOG! NM" \HA-GOGEA"Y" (77) 
and 
R., Ri = GOG" -GOG! (A")'G,0G" . (78) 


By inspection of (77), the solution (76) achieves 


ge cm) 
Since | Rn , excludes the estimator solution 7, this quantity defines the 


AH 
lower bound for 27: L.- 
Example 3. Consider the output estimation case where G = G, and 


Hox = G0G,! (G,0G, +R)" , (80) 


which is of order n* complexity. Using GOG," = AA” — R leads to the n?-order 
solution 


Hp, =1-R(AA")", (81) 


“Whatever has been done before will be done again. There is nothing new under the sun.” Ecclesiastes 
1:9 
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It is interesting to note from (81) and 


Fi Ri =G,0G; -G,0G;' (GOOG, -R) ‘GOOG, (82) 


that lim H =I and him Re, on =(. That is, output estimation is superfluous 
R>' 


A AH AD 
when measurement noise is absent. Let {7#,,7; }, = (Ri Pat, + iris, 
denote the causal part of RR, . It is shown below that minimum-variance filter 


solution can be found using the above completing-the squares technique and 
taking causal parts. 


Lemma 9: The filter solution 


{HL}, ={GOG'N "AS, 
={G.OGIN"} A" (83) 


bs provided that the inverses exist. 


= = |Z, Ry, 


vt 


minimises | py" 
Proof: It follows from (77) that 


(Ri Piah. = (ALA— GOG!N"VHN-GOGEN")",,. 4) 
By inspection of (84), the solution (83) achieves 


273. (85) 


It is worth pausing at this juncture to comment on the significance of the above 
results. 

e The formulation (76) is an optimal solution for the time-varying 
smoother problem since it can be seen from (79) that it achieves the best- 
possible (minimum-error-variance) performance. 

e Similarly, (83) is termed an optimal solution because it achieves the best- 
possible filter performance (85). 

e By inspection of (79) and (85) it follows that the minimum-variance 
smoother outperforms the minimum-variance filter. 

e In general, these optimal solutions are not very practical because of the 
difficulty in realising an exact Wiener-Hopf factor. 


Practical smoother (and filter) solutions that make use of an approximate Wiener- 
Hopf factor are described below. 


“There is nothing more difficult to take in hand, more perilous to conduct, or more uncertain in its 
success, than to take the lead in the introduction of a new order of things.” Niccolo Di Bernado dei 
Machiavelli 
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6.5.4.3 Optimal Realisable Solutions 


Output Estimation 


The Wiener-Hopf factor is modelled on the structure of the spectral factor which 
is described Section 3.4.4. Suppose that R(f) > 0 for all ¢ © [0, T] and there exist 


R'(t) > 0 such that R(t) = R"(t) R'(¢). An approximate Wiener-Hopf factor, A, 
is defined by the system 


ee , ee K()R” wee a 
o(t) C(t) R(t) 2(t) | 
where K(f) = P(t)C’(t)R™'(t) is the Kalman gain in which P(j) is the solution of 
the Riccati differential equation 

P(t) = A(t)P(t)+ P(t)A’ (t)— P(t)C’ (t)R (C(t) P(t) + BO(L)B(t). (87) 
The output estimation smoother (81) can be approximated as 

Ho, =1-R(AA")" 
=1-RA*A", (88) 


An approximate Wiener-Hopf factor inverse, A*, within (88) is obtained from 
(86) and the Matrix Inversion Lemma, namely, 


ae Kp) le (89) 
an} L-RVPOCH RVPOILZO] 


where x(t) ¢ R" is an estimate of the state within A”. From Lemma 1, the 


adjoint of A“, which is denoted by A~" , has the realisation 


ee : ee —ChOKMH) -COR" Alea (00) 
Bt) Kt) RM) JLo | 


where é(t) € R” is an estimate of the state within A" . Thus, the smoother (88) 
is realised by (89), (90) and 


“If I have a thousand ideas and only one turns out to be good, I am satisfied.” Al/fred Bernhard Nobel 
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VET) = 2(t)- ROBO. (91) 


Procedure 1. The above output estimator can be implemented via the following 

three steps. 

Step 1. Operate A” on the measurements z(t) using (89) to obtain a(f). 

Step 2. In lieu of the adjoint system (90), operate (89) on the time-reversed 
transpose of a(t). Then take the time-reversed transpose of the result to 
obtain (ft). 

Step 3. Calculate the smoothed output estimate from (91). 


Example 4. Consider an estimation problem parameterised by a = — 1, b = V2,¢ 
1, d=0, 02 = o? =1, which leads to p = k= V3 — 1 [26]. Smoothed output 


y 


estimates may be obtained by evolving 


X(t) = V3%(t) + V3z(t) , a(t) =-2(t) + 2(0), 


time-reversing the a@(¢) and evolving 


E(t) = V3E(1) + V3a(t), BI =-E(N +a), 


then time-reversing ((t) and calculating 
P(t|T) =2(t)- BO). 
Filtering 
The causal part {A,,}, of the minimum-variance smoother (88) is given by 


(Hoch. =I[- RA" 4 AS 
= 7 — RRA" 
=J-RI7A", (92) 


Employing (89) within (92) leads to the standard minimum-variance filter, 
namely, 


X(t| 1) =(AM-KOCO)KE|D) + KO20 (93) 
J(E|H =COXt|) . (94) 


“Ten geographers who think the world is flat will tend to reinforce each other’s errors....Only a sailor 
can set them straight.” John Ralston Saul 
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Input Estimation 


As discussed in Chapters 1 and 2, input estimates can be found using G = J, and 


substituting A for A within (76) yields the solution 


H,, =OG,'(AA")' =OG A". (95) 
As expected, the low-measurement-noise-asymptote of this equaliser is given by 
lim H,, =G,'. That is, at high signal-to-noise-ratios the equaliser approaches 


G,' , provided the inverse exists. 


The development of a differential equation for the smoothed input estimate, 
w(t| 7), makes use of the following formula [27] for the cascade of two systems. 


Suppose that two linear systems G and G, have state-space parameters 
a a and fe | respectively. Then GG is parameterised by 
LC, Dd, C, D, 

[4 O B, 

B,C, A, B,D, |. It follows that W(t|T) =OG" A“ a(t) is realised by 

| DiC, C, D,D, 


EQ) |] [A-CTOK(H OO COR’ EO 
~~) |=; -CT(QKT(H A) CT ORV GD rH}. ©8 
—W(t | T) OD (HK (1) OMB) OWD HR"? (d) |L a(t) 


in which y(t) € R” is an auxiliary state. 


Procedure 2. Input estimates can be calculated via the following two steps. 


Step 1. Operate A” on the measurements z(t) using (89) to obtain a(?). 
Step 2. In lieu of (96), operate the adjoint of (96) on the time-reversed transpose 
of a(t). Then take the time-reversed transpose of the result. 


State Estimation 


Smoothed state estimates can be obtained by defining the reference system G 
within (76) as 


“In questions of science, the authority of a thousand is not worth the humble reasoning of a single 
individual.” Galileo Galilei 
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R(t |T) = A(DR(t|T) + BOWE). (97) 


That is, a smoother for state estimation is given by (89), (96) and (97). In 
frequency-domain estimation problems, minimum-order solutions are found by 
exploiting pole-zero cancellations, see Example 13 of Chapter 1. Here in the time- 
domain, (89), (96), (97) is not a minimum-order solution and some numerical 
model order reduction may be required. 


Suppose that C(f) is of rank n and D(f) = 0. In this special case, an n?-order 
solution for state estimation can be obtained from (91) and 


(¢|T)=C'OFIEIT), (98) 


where 


cii=(CTOCH) CTH (99) 


denotes the Moore-Penrose pseudoinverse. 


6.5.4.4 Performance 


An analysis of minimum-variance smoother performance requires an identity 
which is described after introducing some additional notation. Let a = Gw 
denote the output of linear time-varying system having the realisation 


X(t) = A(t)x(t) + w(t) (100) 
a(t) = x(t), (101) 


where w(t) € R” and A() € R””. By inspection of (100) — (101), the output of 
the inverse system w= G,'y is given by 
w(t) = a(t)- A(Ha(t). (102) 


Similarly, let 8 = Gu denote the output of the adjoint system G,", which from 
Lemma | has the realisation 


“The mind likes a strange idea as little as the body likes a strange protein and resists it with similar 
energy. It would not perhaps be too fanciful to say that a new idea is the most quickly acting antigen 
known to science. If we watch ourselves honestly we shall often find that we have begun to argue 
against a new idea even before it has been completely stated.” Wilfred Batten Lewis Trotter 
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~C(t) = A (S(t) +u(t) (103) 
BW) =S(0). (104) 


It follows that the output of the inverse systemu= GB is given by 


u(t) =-B()- A" (OBO). (105) 


The following identity is required in the characterisation of smoother performance 


—P(t)A’ (t)- A(t) P(t) = POG." + F'P(t), (106) 


where P(f) is an arbitrary matrix of compatible dimensions. The above equation 
can be verified by using (102) and (105) within (106). Using the above notation, 
the exact Wiener-Hopf factor satisfies 


AA” = C(t)G, BO()B' (t)G,"C" (1) + R(t). (107) 


It is observed below that the approximate Wiener-Hopf factor (86) approaches the 
exact Wiener Hopf-factor whenever the problem is locally stationary, that is, 


whenever A(t), B(t), C(t), O(t) and R(t) change sufficiently slowly, so that P(t) of 
(87) approaches the zero matrix. 


Lemma 10 [8]: In respect of the signal model (1) — (2) with D(t) = 0, E{w(t)} = 
Efv(y} = 0, Efww'(p} = OW, E&(v'()} = RW, Efw(pv'())} = 0 and the 


quantities defined above, 

AA" = AA" —C(NG, P(OG,C' (t). (108) 
Proof: The approximate Wiener-Hopf factor may be written as A = 
C()G, K()R'?(t) + R'7(t). It is easily shown that AA" = C(t)G, (PG" 
GIP + K(ORMK'O)GEIC(H) and using (106) gives AA" = 
COZ, (BHOWB' (th -— PHGEC(t + R(t). The result follows by 
comparing AA" and (107). 


+ 


Consequently, the minimum-variance smoother (88) achieves the best-possible 
Rei2Rr 


estimator performance, namely | 


ie 0, whenever the problem is locally 
stationary. 


Lemma 11 [8]: The output estimation smoother (88) satisfies 


“Every great advance in natural knowledge has involved the absolute rejection of authority.” Thomas 
Henry Huxley 
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Rein = RN[(AA")' —(AA" ~ COG, POG"C' (O) 1A. (109) 


Proof: Substituting (88) into (77) yields 


Rey = RO(AA")' - AA") "JA. (110) 


The result is now immediate from (108) and (110). 


Conditions for the convergence of the Riccati difference equation solution (87) 
and hence the asymptotic optimality of the smoother (88) are set out below. 


Lemma 12 [8]: Let S(t) =C™()R 1()C(t). If( i) there exist solutions P(t) > P(t+6,) 
of (87) for at > 6; > 0; and 


(ii) 
O(t) A(t) |. O(t+6,)  A(t+o,) 
A(t) -S(t)|>| ATt+5,) -S(t+5) (111) 
for all t > 6; then 
ink RE, <0 a 


Proof: Conditions (i) and (ii) together with Theorem I imply P(t) = P(t+6,) for all 
t > 6, and lim P(t) = 0. The claim (112) is now immediate from Lemma 11. 


6.5.5 Performance Comparison 


The following scalar time-invariant examples compare the performance of the 
minimum-variance filter (92), maximum-likelihood smoother (50), Fraser-Potter 
smoother (73) and minimum-variance smoother (88) under Gaussian and 
nongaussian noise conditions. 


Example 5 [9]. Suppose that A = — 1 and B = C = Q = 1. Simulations were 
conducted using T = 100 s, 6t = 1 ms and 1000 realisations of Gaussian noise 


processes. The mean-square-error (MSE) exhibited by the filter and smoothers as 
a function of the input signal-to-noise ratio (SNR) is shown in Fig. 2. As expected, 
it can be seen that the smoothers outperform the filter. Although the minimum- 
variance smoother exhibits the lowest mean-square error, the performance benefit 
diminishes at high signal-to-noise ratios. 


“The definition of insanity is doing the same thing over and over again and expecting different results.” 
Albert Einstein 
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Example 6 [9]. Suppose instead that the process noise is the unity-variance 
-1 
sin(t) ? 


where o- 


deterministic signal w(‘) = sin(t)o. ‘inr) Genotes the sample variance 


of sin(¢). The results of simulations employing the sinusoidal process noise and 
Gaussian measurement noise are shown in Fig. 3. Once again, the smoothers 
exhibit better performance than the filter. It can be seen that the minimum- 
variance smoother provides the best mean-square-error performance. The 
minimum-variance smoother appears to be less perturbed by nongaussian noises 
because it does not rely on assumptions about the underlying distributions. 


-27 
-28 (iv) 
(iil) 
mM .29 a 
so] a 3s 
ul (),(i) ui 
2 -30 2 10 (OXC) 
-31 s 
6 
-32 
0 5 5 0 5 
SNR, dB SNR, dB 
Fig. 2. MSE versus SNR for Example 4: (i) Fig. 3. MSE versus SNR for Example 5: (i) 


minimum-variance smoother, (ii) Fraser-Potter minimum-variance smoother, (ii) Fraser-Potter 
smoother, (iii) maximum-likelihood smoother smoother, (iii) maximum-likelihood smoother 
and (iv) minimum-variance filter. and (iv) minimum-variance filter. 


6.6 Chapter Summary 


The fixed-point smoother produces state estimates at some previous point in time, 
that is, 


E(t) = ENC’ OR "(O(2()- COKE), 
where 2(f) is the smoother error covariance. 


In fixed-lag smoothing, state estimates are calculated at a fixed time delay 7 
behind the current measurements. This smoother has the form 


X(t|t+7) = A A(t| ¢+7)+ BOOWB’ (OP(O (RE|t+7)- X(t |)) 


+P(1)@' (t+7,0)C' (t+r)R (t+7)(2(t+7)-—C(t+ r)X(t+7)), 


where ®(¢ + 7, #) is the transition matrix of the minimum-variance filter. 


“The first problem for all of us, mean and women, is not to learn but to unlearn.” Gloria Marie Steinem 
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Three common fixed-interval smoothers are listed in Table 1, which are for 
retrospective (or off-line) data analysis. The Rauch-Tung-Streibel (RTS) smoother 
and Fraser-Potter (FP) smoother are minimum-order solutions. The RTS smoother 
differential equation evolves backward in time, in which G(r) = 
B(r)O(r)B" (rt)P'(r) is the smoothing gain. The FP smoother employs a linear 
combination of forward state estimates and backward state estimates obtained by 
running a filter over the time-reversed measurements. The optimum minimum- 
variance solution, in which A(t) = A(t)—K(t)C(t), in which K(f) is the predictor 
gain, involves a cascade of forward and adjoint predictions. It can be seen that the 


optimum minimum-variance smoother is the most complex and so any 
performance benefits need to be reconciled with the increased calculation cost. 


ASSUMPTIONS MAIN RESULTS 
ee =£ DOr X(t) = A(t)x(t) + B(t)w(t) 
3 ow) a ee ae y(t) = C(t)x(t) 
s Etv()v'(O} = RO z(t) = y(t) + v(t) 
a q > 0 are known. 
5) @ A(t), B(t) and Cis) 
“” 4 are known. 
Assumes that the -X(r |T) = A(z) 8(c | T) + G(r) (X(z | T)- ¥(z | 7)) 
filtered and 
7 smoothed states are 
= | normally 
2 | distributed. £(t| 1) 
5 previously 
2 | calculated by 
na Kalman filter. 
i X(t |) previously Rt|T) =(P'(t| D+ = 71(t| 0)" 
2 calculated b Peer. iF 
E lina ries (PU |DxG|O+2° (| NSE|D) 
& g 
. X(t) | _ A(t) K(t) ea 
4 a(t)| [-R'~acH RV? OIL 
E ae : a ) ee 
e Bit) K'(t) R(t) a(t) 
& ME|T) =2z()- ROBO 


Table 1. Continuous-time fixed-interval smoothers. 


“Remember a dead fish can float downstream but it takes a live one to swim upstream.” William 
Claude Fields 
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The output estimation error covariance for the general estimation problem can be 
written as 7,70) = 2,7 + F,,72,, where #72" specifies a lower 


eil* “eil ei2” “ei2 ? eil* “eil 


performance bound and #,,#% is a function of the estimator solution. The 


ei 2” “ei2 


optimal smoother solution achieves | ie}. 


ae 0 and provides the best 


mean-square-error performance, provided of course that the problem assumptions 
are correct. The minimum-variance smoother solution also attains best-possible 
performance whenever the problem is locally stationary, that is, when A(‘), BCA), 
C(t), Q(t) and R(A) change sufficiently slowly. 


6.7. Problems 


Problem 1. Write down augmented state-space matrices A(t), B(t) and C(t) 
for the continuous-time fixed-point smoother problem. 


(i) Substitute the above matrices into P(t) = A®(t)PO(t) + 
PO Ay (t) - POA(COY OR HC? (HP® (t) + 
B(HDOM(B(H) to obtain the component Riccati differential 
equations. 

(i1) Develop expressions for the continuous-time fixed-point smoother 


estimate and the smoother gain. 


Problem 2. The Hamiltonian equations (60) were derived from the forward 
version of the maximum likelihood smoother (54). Derive the alternative form 


eee A(t) ce Sel vee 0 | 
dein) | Lcor'oco 4@  jLeein] LcoR'ozo] 


from the backward smoother (50). Hint: use the backward Kalman-Bucy filter and 
the backward Riccati equation. 


Problem 3. It is shown in [6] and [17] that the intermediate variable within the 
Hamiltonian equations (60) is given by 


E(t|T)= J" (s,1)C7 (MR Ms(2(s) -C(s)X(s|s))ds, 


where ®7(s,f) is the transition matrix of the Kalman-Bucy filter. Use the above 
equation to derive 


“He who rejects change is the architect of decay. The only human institution which rejects progress is 
the cemetery.” James Harold Wilson 
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-§(t|T)=-C'()R'()CORE|T) + AT OE-C™OR "(Oz(1) 


Problem 4. Show that the adjoint of system having state space parameters 


A(t) B(t -A'(t) -C'(t 
Bo) is parameterised by E © = ©) : 
Cit) Dit) Bit) Do 
; A(t) I 
Problem 5. Suppose G is a system parameterised by 7 oO? show that 


-P()A'(t) - AMP) = POG" + GF" PO). 


Problem 6. The optimum minimum-variance smoother was developed by finding 
the solution that minimises | wy |, . Use the same completing-the-square 


approach to find the optimum minimum-variance filter. (Hint: Find the solution 


that minimises [5.5], ) 


Problem 7 [9]. Derive the output estimation minimum-variance filter by finding a 
solution Let ae R,b=1,c € R and d=0 denote the time-invariant state-space 
parameters of the plant ¢ . Denote the error covariance, gain of the Kalman filter 
and gain of the maximum-likelihood smoother by p, k and g, respectively. Show 
that 


Hi(s) = k(s—atkey', 
Hb(s) = cgk(-s—atgy'(s—atke)", 
H(s) = ke(-a + ke)(s—a + ke)'!(-s -a + ke)", 
Has) = (Ca + key — (-a + ke—k)’\(s —a + key (-s -a + key! 


are the transfer functions of the Kalman filter, maximum-likelihood smoother, the 


Fraser-Potter smoother and the minimum variance smoother, *~ respectively. 


Problem 8. 

(i) Develop a state-space formulation of an approximate Wiener-Hopf factor 
for the case when the plant includes a nonzero direct feedthrough matrix 
(that is, D(t) #0). 

(i1) Use the matrix inversion lemma to obtain the inverse of the approximate 
Wiener-Hopf factor for the minimum-variance smoother. 


“Tt is not the strongest of the species that survive, nor the most intelligent, but the most responsive to 
change.” Charles Robert Darwin 
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6.8 Glossary 
P(x()) Probability density function of a continuous random 
variable x(t). 
x(t)~ MV (4, RK...) The random variable x(t) has a normal distribution with 
mean yw and covariance R,.. 
Sx) Cumulative distribution function or likelihood function 
of x(0). 
X(t |t+7) Estimate of x(f) at time ¢ given data at fixed time lag t. 
X(t | T) Estimate of x(f) at time ¢ given data over a fixed interval 
T. 
w(t|T) Estimate of w(f) at time ¢ given data over a fixed interval 
T. 
G(a) Gain of the minimum-variance smoother developed by 
Rauch, Tung and Striebel. 
R. A linear system that operates on the inputs i = 
[»" w]e and generates the output estimation error e. 
Cw Moore-Penrose pseudoinverse of C(f). 
A The Wiener-Hopf factor which satisfies AA” = 
GOG" +R. 

A Approximate Wiener-Hopf factor. 

MP The adjoint of A. 

Ae The inverse of A” . 

Nasal The causal part of A~”. 
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7. Discrete-Time Minimum-Variance 
Smoothing 


7.1 Introduction 


Observations are invariably accompanied by measurement noise and optimal 
filters are the usual solution of choice. Filter performances that fall short of user 
expectations motivate the pursuit of smoother solutions. Smoothers promise useful 
mean-square-error improvement at mid-range signal-to-noise ratios, provided that 
the assumed model parameters and noise statistics are correct. 

In general, discrete-time filters and smoothers are more practical than the 
continuous-time counterparts. Often a designer may be able to value-add by 
assuming low-order discrete-time models which bear little or no resemblance to 
the underlying processes. Continuous-time approaches may be warranted only 
when application-specific performance considerations outweigh the higher 
overheads. 


This chapter canvasses the main discrete-time fixed-point, fixed-lag and fixed 
interval smoothing results [1] — [9]. Fixed-point smoothers [1] calculate an 
improved estimate at a prescribed past instant in time. Fixed-lag smoothers [2] — 
[3] find application where small end-to-end delays are tolerable, for example, in 
press-to-talk communications or receiving public broadcasts. Fixed-interval 
smoothers [4] — [9] dispense with the need to fine tune the time of interest or the 
smoothing lags. They are suited to applications where processes are staggered 
such as delayed control or off-line data analysis. For example, in underground coal 
mining, smoothed position estimates and control signals can be calculated while a 
longwall shearer is momentarily stationary at each end of the face [9]. Similarly, 
in exploration drilling, analyses are typically carried out post-data acquisition. 


The smoother descriptions are organised as follows. Section 7.2 sets out two 
prerequisites: time-varying adjoint systems and Riccati difference equation 
comparison theorems. Fixed-point, fixed-lag and fixed-interval smoothers are 


“An inventor is simply a person who doesn't take his education too seriously. You see, from the time a 
person is six years old until he graduates from college he has to take three or four examinations a year. 
If he flunks once, he is out. But an inventor is almost always failing. He tries and fails maybe a 
thousand times. If he succeeds once then he's in. These two things are diametrically opposite. We often 
say that the biggest job we have is to teach a newly hired employee how to fail intelligently. We have 
to train him to experiment over and over and to keep on trying and failing until he learns what will 
work.” Charles Franklin Kettering 
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discussed in Sections 7.3, 7.4 and 7.5, respectively. It turns out that the structures 
of the discrete-time smoothers are essentially the same as those of the previously- 
described continuous-time versions. Differences arise in the calculation of Riccati 
equation solutions and the gain matrices. Consequently, the treatment is somewhat 
condensed. It is reaffirmed that the above-mentioned smoothers outperform the 
Kalman filter and the minimum-variance smoother provides the best performance. 


7.2 Prerequisites 
7.2.1 Time-varying Adjoint Systems 


Consider a linear time-varying system, ¢ , operating on an input, w, namely, y = 
G w. Here, w = [w1, ..., wv] € R’’™”, denotes the matrix of w, € R” over an 


interval k € [1, NJ]. It is assumed that G:R’“Y — R”™ has the state-space 
realisation 


Xp = A,X, + BW, 5 (1) 
2 
y, = Ox, + Dw, 2) 


As before, the adjoint system, G”, satisfies 


<y, G w =< G" y, w> (3) 
for ally ¢ R’” andwe R””. 


Lemma 1: In respect of the system G described by (1) — (2), with xo = 0, the 


adjoint system G" having the realisation 
Gey eo (4) 
Z4 =-Bi¢,+Diu,, (6) 
with 6, =0, satisfies (3). 


A proof appears in [7] and proceeds similarly to that within Lemma | of Chapter 
2. The simplification D; = 0 is assumed below unless stated otherwise. 


7.2.2 Riccati Equation Comparison 


The ensuing performance comparisons of filters and smoothers require methods 
for comparing the solutions of Riccati difference equations which are developed 


“If you’re not failing every now and again, it’s a sign you’re not doing anything very innovative.” 
(Woody) Allen Stewart Konigsberg 
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below. Simplified Riccati difference equations which do not involve the B, and 
measurement noise covariance matrices are considered initially. A change of 
variables for the more general case is stated subsequently. 


E R”™’, C 


t+k-1 


Suppose there exist A, , e R™", One = Oss e R”™ and 
P.., = P,, € R”" forar>0 and k= 0. Following the approach of Wimmer 


[10], define the Riccati operator 


OE A gaa put Ops) = A, 5 Al, + y 


+k-l° t+k-1*"t+k-1 t+k-1 
AT A AT fh vi 
Aly Pa Cras 4+ C,.5 Pap Cae) Cel pais : (6) 
A ae see roe 
Let Ty., = ae we) denote the Hamiltonian matrix 
an Oa Aisa 


a . 0 -I 
corresponding to ®(P,, ,,4..,45C,.,4.Q0.,,) and define J = ( ; | in which 


J is an identity matrix of appropriate dimensions. It is known that solutions of (6) 


y Ay, 
are monotonically dependent on JT,,, = be ae | Consider a 
Aig oreo s 


second Riccati operator employing the same initial solution P,, , but different 


state-space parameters 


OP AC Og) = Avg PGA, + om 


t+k 
SA Pia (I at (Clee gira Ore Ye P Al, (7) 


t+k” t+k-1*"t+k * 


The following theorem, which is due to Wimmer [10], compares the above two 
Riccati operators. 


Theorem 1: [10]: Suppose that 


Gs : Alt > C4 Ait 
Ani SOC aa A a Oe 
forat=0Oand for all k = 0. Then 
DF Ags a Cis ? are) 2 OP. ? Aisi id Cy ? Oy, ) (8) 


forallk>=0. 


“Although personally I am quite content with existing explosives, I feel we must not stand in the path 
of improvement.” Winston Leonard Spencer-Churchill 
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The above result underpins the following more general Riccati difference equation 
comparison theorem. 


Theorem 2: [11], [8]: With the above definitions, suppose for a t = 0 and for all k 
> 0 that: 


(i) there existsa P > P,, and 
Ay va Ay T 
(ii) ee _ Ava | > re Aust | : 
Aye os OF OFT A,,, Cr Cre 
Then ae = Fee for all k 2 0. 


Proof: Assumption (i) is the k = 0 case for an induction argument. For the 
inductive step, denote Pex = OF t+k-1? Ay BC peas) and Peeps 


O(P t+k? A, ? Cy ? om ). Then 


Rae Ty =(®(F tt+k-1? Ay 19 “t+k—- Oe: = OF t+k-1? Aves C20.) 
+(OF t+k-1? Arse, Can» Quan) — OF t+k? AC a0): (9) 


The first term on the right-hand-side of (9) is non-negative by virtue of 
Assumption (ii) and Theorem 1. By appealing to Theorem 2 of Chapter 5, the 
—P 


t+k+1 za 


second term on the right-hand-side of (9) is non-negative and thus P. 
0. 


t+k 


A change of variables [8] C, = R,'?C, and QO, = B,O,B’, allows the 
application of Theorem 2 to the more general forms of Riccati differential 
equations. 


7.3 Fixed-Point Smoothing 


7.3.1 Solution Derivation 


The development of a discrete-time fixed-point smoother follows the continuous- 
time case. An innovation by Zachrisson [12] involves transforming the smoothing 
problem into a filtering problem that possesses an augmented state. Following the 


x 
approach in [1], consider an augmented state vector x{” = for the signal 
k 


model 


“Never before in history has innovation offered promise of so much to so many in so short a time.” 
William Henry (Bill) Gates III 
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(a4) _ g(a) \(a) (a) 
Xp = AX + Bw, , (10) 


= CO, 
Z, =O xe +V,, (11) 


01 


first component of x\” is xx, the state of the system xe+) = Aute + Bawe, ve = Cex + 


A, 0 B 
where A\” = |. BO = 4 and C\” =[C, 0]. It can be seen that the 


vz. The second component, ¢,, equals x, at time & = 1, that is, €, = x,. The 


objective is to calculate an estimate &, of &, at time k= from measurements zx 


over k € [0, N]. A solution that minimises the variance of the estimation error is 
obtained by employing the standard Kalman filter recursions for the signal model 
(10) — (11). The predicted and corrected states are respectively obtained from 


Rie = LLCO Yi tL 2,» (12) 
ee (13) 
(14) 


— (a) __ (a) (4) ) (4) (a) 
= (4 — Ke CO Gia t+ KOZ, 


where K; = A(?L\ is the predictor gain, Li = P (CL (COR (Cry + 
R,)' is the filter gain, 


Pie = Pie — Peay Ley (15) 


is the corrected error covariance and 


PO = AOPo Amy SRO oO (sey (16) 
SAP PO ASY = AO Pe (Coy ROY ERO O. BOY 
= APO (APY - (CY (KY) )+ BO, BOY (17) 


is the predicted error covariance. The above Riccati difference equation is written 
in the partitioned form 


“You can’t just ask customers what they want and then try to give that to them. By the time you get it 
built, they’ll want something new.” Steven Paul Jobs 
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‘a 
P™ = Prost vse 
k+V/k y Q 
k+I/k k+1/k 


-|¢ Ae 1 ke ‘| 
0 FZ yp Qn 
{4 0 iF i |[ 4 Ki | CPt? iC +R : 


2 ‘Jo (Be Ol. (18) 


in which the gains are given by 


oe alll A. 0]| Py. ay 7 
«P| ‘=| k | k/k-1 ale ler ee < +R)" 
L, 0 Lf Leet TDi 0 


-|* Pik < 


c? lec Pipa + Ry (19) 
ra 


see also [1]. The predicted error covariance components can be found from (18), 
viz., 


Fae AP iy Ap +B,O,B: (20) 
Des = iret (A; a Ci Ky) > (21) 
22 
Qh = Qi = 2G - (22) 
The sequences (21) — (22) can be initialised with X,,,, = P,, and Q.,,,, = P,, 
The state corrections are obtained from (12), namely, 
Kein = Xia +h (& —GXgea)> (23) 
Sik = Serna + Ly (% — Okie)» (24) 
Similarly, the state predictions follow from (13), 
Kya = Aken > (25) 
Craik = Skit (26) 


“Innovation is the whim of an elite before it becomes a need of the public.” Ludwig von Mises 
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In summary, the fixed-point smoother estimates for k > are given by (24), which 
is initialised by a = x,,. The smoother gain is calculated as 


t/t 


L,= pen Ook ( GUN Smet Ore +R)", where &,,,_, 18 given by (21). 
7.3.2 Performance 


It follows from the above that 0,,,,, =©,,, and so 


Qhaven = Qe Spe Ey (27) 


Next, it is argued that the discrete-time fixed-point smoother provides a 
performance improvement over the filter. 


Lemma 2 [1]: In respect of the fixed point smoother (24), 


Pie 2 Quin (28) 
Proof: The recursion (22) may be written as the sum 
k 
Lari = Qe Se oe (CRC, +R)" C2iyi1 > (29) 


i=t 


t i* i/i- 


k 
where Q,,, = P.,. Hence, P,. — Quin = PHC iGe CP ERY CZ i 


= 0. 
R=0.01,0.05,0.1,0.2,0.5,1,5 R=0.01,0.05,0.1,0.2,0.5,1,5 

6 6 

. 5 

4 
4 

2 G 

4 2 

0 1 

5 10 15 5 10 15 
k k 


Fig. 1(a). Smoother estimation variance ee Fig. 1(b). Smoother estimation variances 


versus k for Example 1. k+t/ka1 Versus k for Example 1. 


“Tf the only tool you have is a hammer, you tend to see every problem as a nail.” Abraham Maslow 
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Example 1. Consider a first-order time-invariant plant, in which 4 = 0.9, B=1, C 
= 0.1 and QO = 1. An understanding of a fixed-point smoother’s performance can 


be gleaned by examining the plots of the X,,,,, and Q,,,,,,, sequences shown in 


Fig. 1(a) and (b), respectively. The bottom lines of the figures correspond to 
measurement noise covariances of R = 0.01 and the top lines correspond to R= 5. 


It can be seen for this example, that the X,,, have diminishing impact after about 


15 samples beyond the point of interest. From Fig. 1(b), it can be seen that 
smoothing appears most beneficial at mid-range measurement noise power, such 


as R= 0.2, since the plots of Q,,,,,,, become flatter for R= 1 and R< 0.05. 


7.4 Fixed-Lag Smoothing 
7.4.1 High-order Solution 


Discrete-time fixed-lag smoothers calculate state estimates, x, _,,, , at time k given 
a delay of N steps. The objective is to minimise E{(x,_y — X,_y4)(X%ya 7 


%,_y,)'}- A common solution approach is to construct an augmented signal 


model that includes delayed states and then apply the standard Kalman filter 
recursions, see [1] — [3] and the references therein. Consider the signal model 


ee? | | eae 40 Ol] ~ | |B 
x; Ty 0]| x, 0 
x,, |=) 0 Ly : || x,_, [+] 0 (30) 
IXin-w | [9 O Ty Off % ey} [9 
and 
ro 
Xe 
z,=[C, 0 0 ++ Ol] x |+y%- (31) 
L*ew J 


By applying the Kalman filter recursions to the above signal model, the predicted 
states are obtained as 


“You have to seek the simplest implementation of a problem solution in order to know when you’ve 
reached your limit in that regard. Then it’s easy to make tradeoffs, to back off a little, for performance 
reasons.” Stephen Gary Wozniak 
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Kea k 0 0 Ki, Ky k 
Ky Ty 0 a K, k 
os = : || 2 n 2 
X41 [=| 9 Ly S|] Kyo [+] Kae |e —CerXena) > (32) 
Een L 0 0 Ty 0| tessa | Ke | 


where Kos, Kix, K2%, ..., Kx denote the submatrices of the predictor gain. Two 
important observations follow from the above equation. First, the desired 
smoothed estimates X,,,, ..-X,; v4, are contained within the one-step-ahead 


prediction (32). Second, the fixed lag-smoother (32) inherits the stability 
properties of the original Kalman filter. 


7.4.2 Reduced-order Solution 


Equation (32) is termed a high order solution because the dimension of the above 
augmented state matrix is (V+ 2)nx(N+2)n . Moore [1] — [3] simplified (32) to 
obtain elegant reduced order solution structures as follows. Let 


(0,0) (0,1) (0,N) 

Pest Pest ae Pek 

Pam) pat) i - 7 
k+W/k k+t/k - Gj) Gi) \T 
: om > Pat ~~ (Fit) 2 
(N,0) (N,N) 

Pest me Posi 


denote the predicted error covariance matrix. For 0 < i < N, the smoothed states 
within (32) are given by 


Kei = Nites £Kige= CiXina)> (33) 
where 
Kie = Pine (CPG: +R) - (34) 


i+l,k 


Recursions for the error covariance submatrices of interest are 


(i+1,0) _ (i,0) T 
Peak = Pek (A, —K,,C,) ’ (35) 
(4Li+1) _ pli) _ pli.0) T 36 
Peak = Pogik Pn Kiar : ( ) 


“When I am working on a problem, I never think about beauty but when I have finished, if the solution 
is not beautiful, I know it is wrong.” Richard Buckminster Fuller 
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Another rearrangement of (33) — (34) to reduce the calculation cost further is 
described in [1]. 


7.4.3. Performance 


Two facts that stem from (36) are stated below. 


Lemma 3: In respect of the fixed-lag smoother (33) — (36), the following applies. 


(i) The error-performance improves with increasing smoothing lag. 
(ii) The fixed-lag smoothers outperform the Kalman filter. 
Proof: 


(i) The claim follows by inspection of (34) and (36). 
(ii) The observation follows by recognising that P{\), = EX(x, 


Ken MX, — ys within (i). 


It can also be seen from the term —P{.C7(C, POC! + R,)'C, Po). within 


(36) that the benefit of smoothing diminishes as R; becomes large. 


7.5 Fixed-Interval Smoothing 
7.5.1 The Maximum-Likelihood Smoother 


7.5.1.1 Solution Derivation 


The most commonly used fixed-interval smoother is undoubtedly the solution 
reported by Rauch [5] in 1963 and two years later with Tung and Striebel [6]. 
Although this smoother does not minimise the error variance, it has two desirable 
attributes. First, it is a low-complexity state estimator. Second, it can provide close 
to optimal performance whenever the accompanying Gaussian distribution 
assumptions are reasonable. 


The smoother involves two passes. In the first (forward) pass, filtered state 
estimates, X,,, , are calculated from 


Ken =a thy (A -Chgna)» (37) 
Mesa = A Xai > (38) 


where Lk = P.,,Ci(C,P.,,C, + Rx! is the filter gain, Ki = ApLx is the predictor 
gain, in which Per = Pres — Po, Ch(C,P.n.C, + R,)'C,P.4 and Pein = 


“Performance is your reality. Forget everything else.” Harold Sydney Geneen 


G. A. Einicke, Smoothing, Filtering and Prediction: Estimating 183 
the Past, Present and Future (2 ed.), Prime Publishing, 2019 
A,.P.,,,4, + B,Q,B] . In the second backward pass, Rauch, Tung and Striebel 


calculate smoothed state estimates, x,,,, from the elegant one-line recursion 


Xein = Keit +G, rs = aT) > (39) 


where 
G, = PA aia (40) 


is the smoother gain. The above sequence is initialised by X,,, = x, atk=N. In 
the first public domain appearance of (39), Rauch [5] referred to a Lockheed 
Missile and Space Company Technical Report co-authored with Tung and 
Striebel. Consequently, (39) is commonly known as the Rauch-Tung-Striebel 
smoother. This smoother was derived in [6] using the maximum-likelihood 
method and an outline is provided below. 


The notation x, ~ A/(u, Rx) means that a discrete-time random variable x, with 


mean « and covariance Ry. has the normal (or Gaussian) probability density 
function 


P(X) 


on pee A Py: (41) 


R. 


Rauch, Tung and Striebel assumed that [6]: 


Kes 4 NMAAZ 9 BO, B; Y. (42) 
Saiy m NM Guhi ). (43) 


From the approach of [6], setting the partial derivative of the logarithm of the joint 
density function to zero results in 


n mn \F 
= OG, — A Sen) 


OX 


nor - n! 
0 (B,O,B; ) paix -A diy) rl 


OR, y —Xpy) navn " 
t a AN Foi Xen —Xeiw)* 
XIN 


Rearranging the above equation leads to 


Sein - UI + Fed, (B,O,By)”' A, mea +P pA, BOB.) Spin) : (44) 


From the Matrix Inversion Lemma 


“An once of performance is worth pounds of promises.” Mary (Mae) Jane West 
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(1+P,,4; (B,O,B,) A)” =I-G,A,. (45) 


The solution (39) is found by substituting (45) into (44). Some further details of 
Rauch, Tung and Striebel’s derivation appear in [13]. 


7.5.1.2 Alternative Forms 


The smoother gain (40) can be calculated in different ways. Assuming that A; is 
non-singular, it follows from P.,,,, = 4,P,,4; + B,Q,B/ that P.,A/.,Po\, = 
Avr = BOB Pan) and 


G, = A, — B,O,By Po) : (46) 


In applications where difficulties exist with inverting P,,,,,, it may be preferable 
to calculate 


Poa = Po GREG - (47) 


It is shown in [15] that the filter (37) — (38) and the smoother (39) can be written 
in the following Hamiltonian form 


Sun |_| 4 BOB] few ] [0 (48) 
Ain =C, RC, Ap Ay ssin CERy 2 (49) 
where Ayw € R" is an auxiliary variable that proceeds backward in time k. The 
form (48) — (49) avoids potential numerical difficulties that may be associated 


with calculating P), ,. 


To confirm the equivalence of (39) and (48) — (49), use the Bryson-Frazier 
formula [15] 


Sean = Xess + Pe Aesn > (50) 
and (46) within (48) to obtain 
Ky = GX pci + A, BO, 8; PaaS : (51) 


Employing (46) within (51) and rearranging leads to (39). 


In time-invariant problems, steady state solutions for P,,, and P,,,,, can be used 
to precalculate the gain (40) before running the smoother. For example, the 


“Error is the discipline through which we advance.” William Ellery Channing 
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application of a time-invariant version of the Rauch-Tung-Striebel smoother for 
the restoration of blurred images is described in [14]. 


7.5.1.3 Performance 


An expression for the smoother error covariance is developed below following the 
approach of [6], [13]. Define the smoother and filter error states as ¥,),, = xx — 


X,y and X,,, =Xk— X,,,, respectively. It is assumed that 


Ei kuet =0, (52) 
PS eat =0, (53) 
EX unten =0. (54) 


It is straightforward to show that (52) implies 
PG idea) = LAE re aN fae ome : (55) 
Denote Lew = E{%,.4,yXiay} - The assumption (53) implies 
ELS sie = Exe aXpat — lea ¢ (56) 
Subtracting x; from both sides of (39) gives 
Key ~ GX pun = Kei . GA Xe . (57) 
By simplifying 
EX(Xpiy = GX Wein = Gyan) } = EX(Xyiy a GA Xi evan a GA Xi )"} (58) 
and using (52), (54) — (56) yields 
Dein = Fe ~ Ge Posie = Lyi Ge : (59) 
It can now be shown that the smoother performs better than the Kalman filter. 
Lemma 4: Suppose that the sequence (59) is initialised with 


Dy =P, 


N+1/N ? 


(60) 


N+1/N 


Then Xyjy < Py, for 1 <k<N. 


“Great thinkers think inductively, that is, they create solutions and then seek out the problems that the 
solutions might solve; most companies think deductively, that is, defining a problem and then 
investigating different solutions.” Joey Reiman 
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Proof: The condition (60) implies Xy,y = Py,y, which is the initial step for an 
induction argument. For the induction step, (59) is written as 
ZEN 7 Poa = Bp (GF eG, + RC Foi =G, (Prsiit =ZpawWGe (61) 


and thus Xysyy S Proayy, tmplies Xiy S Py and Uy S Fey: 


7.5.2 The Fraser-Potter Smoother 


Forward and backward estimates may be merged using the data fusion formula 
described in Lemma 6. A variation of the Fraser-Potter discrete-time fixed-interval 
smoother [4] derived by Monzingo [16] is advocated below. 


In the first pass, a Kalman filter produces corrected state estimates x,,, and 
corrected error covariances P,,, from the measurements. In the second pass, a 
Kalman filter is employed to calculate predicted “backward” state estimates ¢,_,,, 
and predicted “backward” error covariances 2, ,,, from the time-reversed 
measurements. The smoothed estimate is given by [16] 


Seiy Ou + Sade (Pitin + Se ceue) . (62) 


Alternatively, Kalman filters could be used to derive predicted quantities, *,,,_, 
and P,.,,_,, from the measurements, and backward corrected quantities &,,, and 


Z,,- smoothed estimates may then be obtained from the linear combination 


Sein = (Pees He Be sat +ZiSer) . (63) 


It is observed that the fixed-point smoother (24), the fixed-lag smoother (32), 
maximum-likelihood smoother (39), the smoothed estimates (62) — (63) and the 
minimum-variance smoother (which is described subsequently) all use each 
measurement Z; once. 


Note that Fraser and Potter’s original smoother solution [4] and Monzingo’s 
variation [16] are ad hoc and no claims are made about attaining a prescribed level 
of performance. 


“No great discovery was ever made without a bold guess.” /saac Newton 
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7.5.3 Minimum-Variance Smoothers 


7.5.3.1 Optimal Unrealisable Solutions 


Consider again the estimation problem depicted in Fig. 1 of Chapter 6, where w 
and v are now discrete-time inputs. As in continuous-time, it is desired to 
construct a solution # is that produces output estimates j, of a reference system 


y1 =Gw from observations z = y2 + v, where v2 = G,w. The objective is to 


minimise the energy of the output estimation error y =y1— y,. 


The following discussion is perfunctory since it is a regathering of the results from 
Chapter 6. Recall that the output estimation error is generated by ) = 7,,7, where 


= -[H HG,-G| andi = a It has been shown previously that 
w 


RR, = Re Ray + Raia where 


vil? “Vil yi2 
Ri =HA-GOG' A", (64) 


in which A is known as the Wiener-Hopf factor, which satisfies AA" = G Og 
+ R. From Lemma 8 of Chapter 6, the smoother solution H = G,OG,"(AA")' 
achieves the best-possible performance, namely, it minimises | yy = 
2.72", - For example, in output estimation problems G = GJ, and the optimal 
smoother simplifies to H,, =1— R(AA”)'. From Lemma 9 of Chapter 6, the 
(causal) filter solution ¥,, = {GOG"{A "A", = {GOG"A"} A* achieves 
~H 
isl 


The optimal smoother outperforms the optimal filter since 


the best-possible filter performance, that is, it minimises 


I Ri. 
I Ri, 


of the difficulty in Sains A when G and G, are time-varying systems. 


. The above solutions are termed unrealisable because 


= > |, Re: |, 


vi 


Realisable solutions that use an approximate Wiener-Hopf factor in place of A 
are presented below. 


“Life is pretty simple. You do stuff. Most fails. Some works. You do more of what works. If it works 
big, others quickly copy it. Then you do something else. The trick is the doing something else.” 
Leonardo da Vinci 
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7.5.3.2 Non-causal Output Estimation 


Suppose that the time-varying linear system G, has the realisation (1) — (2). An 
approximate Wiener-Hopf factor A is introduced in [7], [13] and defined by 


Oo; C2. *OP ||| 
where Kx = (4,P.,, ,C; + B,O,D/)Q;" is the predictor gain, Q: = C,P.,.,C; + 
D,O,Dj. + Re and P.,,_, evolves from the Riccati difference equation P,,,,, = 
ALi ms (AP SC. + BOD, MCP ae, ~ D,Q,D;, + RCL ya, 


+ D,O,B') + B,O,B!. The inverse of the system (65), denoted by A”, is 
obtained using the Matrix Inversion Lemma 


Kate = A, —K,C, K, Kee (66) 
eA -0;'°C, QQ,” zy 7 
The optimal output estimation smoother can be approximated as 


Ho, =1-R(AAT 
=1-RA*A", (67) 
A state-space realisation of (67) is given by (66), 


T pT yt TO-1/2 
meee eee eee | Ba (68) 
B. -K, Q, a, 
and 
Jun =% -RB,- (69) 


Note that Lemma | is used to obtain the realisation (68) of A” = (A”)* from 
(66). A block diagram of this smoother is provided in Fig. 2. The states X,,, , 


within (66) are immediately recognisable as belonging to the one-step-ahead 
predictor. Thus, the optimum realisable solution involves a cascade of familiar 
building blocks, namely, a Kalman predictor and its adjoint. 


Procedure 1. The above output estimation smoother can be implemented via the 
following three-step procedure. 


“When I examine myself and my methods of thought, I come to the conclusion that the gift of fantasy 
has meant more to me than any talent for abstract, positive thinking.” Albert Einstein 
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Step 1. Operate A” onze using (66) to obtain ax. 

Step 2. In lieu of the adjoint system (68), operate (66) on the time-reversed 
transpose of ox. Then take the time-reversed transpose of the result to 
obtain /,. 

Step 3. Calculate the smoothed output estimate from (69). 

It is shown below that },,, is an unbiased estimate of yx. 


Fig. 2. Block diagram of the output estimation smoother 


Lemma 5 E{},,y} = Ely,}- 
Proof: Denote the one-step-head prediction error by X,,,_, = Xk — Xyj4_,. The 


output of (66) may be written as a, = Q7'7C,x,,., + OQ;7'°v, and therefore 


E{a,}= O° CE GG) FOLEY, (70) 


The first term on the right-hand-side of (70) is zero since it pertains to the 
prediction error of the Kalman filter. The second term is zero since it is assumed 
that Ef{vie = 0. Thus E{a,} = 0. Since the recursion (68) is initialised with Cy = 


0, it follows that EfCij} = 0, which implies EfGi = — KieE{G + Q3'°Ef{a,} = 0. 
Thus, from (69), EX, } = Efagd = Elvi, since it is assumed that E{vi} = 0. 


“The practical success of an idea, irrespective of its inherent merit, is dependent on the attitude of the 
contemporaries. If timely it is quickly adopted; if not, it is apt to fare like a sprout lured out of the 
ground by warm sunshine, only to be injured and retarded in its growth by the succeeding frost.” 
Nicola Tesla 
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7.5.3.3 Causal Output Estimation 


The minimum-variance (Kalman) filter is obtained by taking the causal part of the 
optimum minimum-variance smoother (67) 
op}, =1- RA} A 
=1-R,O2?A", (71) 
To confirm this linkage between the smoother and filter, denote Lx = (C,P.,,_,C; 
+ D,O,D/)Q,;' and use (71) to obtain 


Den = 2 =ROp a; 
= RO} C8 (t= RQY? )z; 
=(C, -L,C,)x, + L,2;, » (72) 


which is identical to (34) of Chapter 4. 


7.5.3.4 Input Estimation 


As discussed in Chapter 6, the optimal realisable smoother for input estimation is 


H,, =OG,'N "A". (73) 


The development of a state-space realisation for w,,, = OG, N "a, makes use 


of the formula for the cascade of two systems described in Chapter 6. The 
smoothed input estimate is realised by 


Crt Ar ~ C.K, 0 Ca Sx 
Vea iF Ci Ky A =C7 0," VR |> (74) 
Wyn -O,D; Kj -O, Bi 0.) O," a, 


in which vy, € R” is an auxiliary state. 


Procedure 2. The above input estimator can be implemented via the following 
three steps. 


Step 1. Operate A“ on the measurements zy using (66) to obtain ax. 
Step 2. Operate the adjoint of (74) on the time-reversed transpose of ox. Then 
take the time-reversed transpose of the result. 


“Doubt is the father of invention.” Galileo Galilei 
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7.5.3.5 State Estimation 


Smoothed state estimates can be obtained by defining the reference system G =I1 
which yields 
Kran = Xin + BW 
= AS, +B,OG A" a,. (75) 
Thus, the minimum-variance smoother for state estimation is given by (66) and 
(74) — (75). As remarked in Chapter 6, some numerical model order reduction 


may be required. In the special case of C; being of rank n and D, = 0, state 
estimates can be calculated from (69) and 


(76) 


A $A 
Xin = Vein + 


where C} =(C/C,)'C/. is the Moore-Penrose pseudo-inverse. 


7.5.3.6 Performance 


The characterisation of smoother performance requires the following additional 
notation. Let y= G w denote the output of the linear time-varying system having 


the realisation 


Xp = AX, +My, , (77) 
VEER (78) 


where 4; € R””. By inspection of (77) — (78), the output of the inverse system w 
= G,'yis given by 


We = Vex — AgVe: (79) 


Similarly, let ¢ = Gu denote the output of the adjoint system G," , which from 
Lemma | has the realisation 


aaa TU (80) 
&, =—6;,- (81) 


It follows that the output of the inverse systemu= Ge is given by 


“The theory of our modern technique shows that nothing is as practical as the theory.” Julius Robert 
Oppenheimer 
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U, = &,,—-A&- (82) 
The exact Wiener-Hopf factor may now be written as 
AA" =C,G,B,O,B; GC] +R, . (83) 


The subsequent lemma, which relates the exact and approximate Wiener-Hopf 
factors, requires the identity 


P.-APAL = APG." + G,'PAL+G'PG,", (84) 


in which P; is an arbitrary matrix of appropriate dimensions. A verification of (84) 
is requested in the problems. 

Lemma 6 [7]: In respect of the signal model (1) — (2) with D, = 0, E{wx} = E{vx} 
= 0, Etw,w, } = 0,64; Efv,v,} = R,6, 


ee E{w,v;} = 0 and the quantities 


defined above, 

AA" = AA" + CG Frit — Pesiie GC. (85) 
Proof: The approximate Wiener-Hopf factor may be written as ee CG, K, QY? 
ag ae Using Poy. ~ AP iA =o AP GC AC Eg iCe te RJ CFA 
+ B.O,B) + P.,, ~— P..,,, within (84) and simplifying leads to (85). 


It can be seen from (85) that the approximate Wiener-Hopf factor approaches the 
exact Wiener-Hopf factor whenever the estimation problem is locally stationary, 
that is, when the model and noise parameters vary sufficiently slowly so that 


Pop. *® P.,. Under these conditions, the smoother (69) achieves the best- 


possible performance, as is shown below. 
Lemma 7 [7]: The smoother (67) satisfies 
Ri. = R, [(AA")" a (AA" —O,G) (Fea ~ Fein GCE yy > (86) 


Proof: Substituting (67) into (64) yields 


R, 


ei2 


=R[(AA")'- (AA”y'JA. (87) 
The result (86) is now immediate from (85) and (87). 


“Whoever, in the pursuit of science, seeks after immediate practical utility may rest assured that he 
seeks in vain.” Hermann Ludwig Ferdinand von Helmholtz 
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Some conditions under which P.,,,, asymptotically approaches F,,,_, and the 
smoother (67) attaining optimal performance are set out below. 


Lemma 8 [8]: Suppose 
(i) forat> 0 that there exist solutions P; > P:+; = 0 of the Riccati difference 
equation 
(ii) Prag a ALP Ap > AL Pip sep (GP aCe + R, i Cig dA, + 
B,O,B; ; and 


(iii) BipiO wa Bigs re > BO. Gbiy Ana 
Ai = Oey ier Oran A; ARs Cus 
forallk>=0. 
Then the smoother (67) achieves 
lim [7,75], =0- (88) 
ton 2. 


Proof: Conditions i) and ii) together with Theorem 1 imply P,,, = P..,, forall 
k>=0 and P,,,_, = 0. The claim (88) is now immediate from Theorem 2. 


An example that illustrates the performance benefit of the minimum-variance 
smoother (69) is described below. 


Example 2 [9]. The nominal drift rate of high quality inertial navigation systems is 
around one nautical mile per hour, which corresponds to position errors of about 
617 m over a twenty minute period. Thus, inertial navigation systems alone cannot 
be used to control underground mining equipment. An approach which has been 
found to be successful in underground mines is called dead reckoning, where the 
Euler angles, 0,, g, and ¢,, reported by an inertial navigation system are 
combined with external odometer measurements, d;. The dead reckoning position 
estimates in the x-y-z plane are calculated as 


Xie Xj sin(@, ) 
Vest |=| Ve [+ (A, — %1)] sin(Q, ) | - (89) 
Zhat zy sin(¢, ) 


A filter or a smoother may then be employed to improve the noisy position 
estimates calculated from (89). Euler angles were generated using 


6... 6) [w® 0.95 0 0 
%..|= Alo, |+| w |,with A4=| 0 0.95 0 | andw” ~ A/(O, 0.01), i 
Pes o | [we 0 9 0.95 


“I do not think that the wireless waves that I have discovered will have any practical application.” 
Heinrich Rudolf Hertz 
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= 1...3. Simulations were conducted with 1000 realisations of Gaussian 
measurement noise added to position estimates calculated from (89). The mean- 
square error exhibited by the minimum-variance filter and smoother operating on 
the noisy dead reckoning estimates are shown in Fig. 3. It can be seen that filtering 
the noisy dead reckoning positions can provide a significant mean-square-error 
improvement. The figure also demonstrates that the smoother can offer a few dB 
of further improvement at mid-range signal-to-noise ratios. 


10) 


(ii) 


(iii 


SNR [4B] a es k 


15 2 


Fig. 3. Mean-square-error of the position Fig. 4. Mean-square-error versus measurement 
estimate versus input signal to noise ratio for noise covariance for Example 3: (i) Kalman 
Example 2: (i) noisy dead reckoning data, (ii) filter, (ii) fixed-lag smoothers, and (iii) optimal 
Kalman filter, and (iii) minimum-variance minimum-variance smoother (69). 

smoother (69) 


7.5.4 Performance Comparison 


It has been demonstrated by the previous examples that the optimal fixed-interval 
smoother provides a performance improvement over the maximum-likelihood 
smoother. The remaining example of this section compares the behaviour of the 
fixed-lag and the optimum fixed-interval smoother. 


Example 3. Simulations were conducted for a first-order output estimation 
problem, in which A = 0.95, B= 1, C=0.1,0=1,R= {0.01, 0.02, 0.5, 1, 1.5, 2} 
and N = 20,000. The mean-square-errors exhibited by the Kalman filter and the 
optimum fixed-interval smoother (69) are indicated by the top and bottom solid 
lines of Fig. 4, respectively. Fourteen fixed-lag smoother output error covariances, 
CPUNMCT i =2 ... 15, were calculated from (35) — (36) and are indicated by 


the dotted lines of Fig. 4. The figure illustrates that the fixed-lag smoother mean- 
square errors are bounded above and below by those of the Kalman filter and 
optimal fixed-interval smoother, respectively. Thus, an option for asymptotically 


“You see, wire telegraph is a kind of a very, very long cat. You pull his tail in New York and his head 
is meowing in Los Angeles. Do you understand this? And radio operates exactly the same way: you 
send signals here, they receive them there. The only difference is that there is no cat.” Albert Einstein 
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attaining optimal performance is to employ Moore’s reduced-order fixed-lag 
solution [1] — [3] with a sufficiently long lag. 


7.6 Chapter Summary 


Solutions to the fixed-point and fixed-lag smoothing problems can be found by 
applying the standard Kalman filter recursions to augmented systems. Where 
possible, it is shown that the smoother error covariances are less than the filter 
error covariance, namely, the fixed-point and fixed-lag smoothers provide a 
performance improvement over the filter. 


ASSUMPTIONS MAIN RESULTS 
E{wit = Efve} = 0. X,4 =A,x, + B,w, 
= E{w,w; } = O>0 Yop = Copy 
& and E{v,v, } = Re> Z_ = Voy tv 
3 &! 0 are known. Aj, By 
3S o) o) = C. 
& ra Ci, and Cr, are Pik SN 1k 
nn 
known. 
Assumes that the Kew = Xpre + Gy, esi ~ Seve) 
filtered and ~ “ 
7 =C, 
s smoothed states are Miskin = Lk RIN 
2 normally distributed. 
5 X, 41, previously 
‘s calculated by 
ne Kalman filter. 
Ken and S, 44 Sin = (Ba Hae Pde ere) 
| previously calculated ~ =C x 
= by forward and an al 
ps g backward Kalman 
4 ©! filters. 
Xestlk e A, ~K, Ch, K, Xk 
a, -O7'"C,, ON Zi 
= E Se = AL a Oe ee ono ig mn 
£ 5 B, ake OP Wig 
Qs A 
O§g VoKIN = Zk -R,B, 


Table 1. Main results for discrete-time fixed-interval smoothing. 


Table 1 summarises three common fixed-interval smoothers that operate on 
measurements z; = yx + v; ofa system G, realised by xx+1 = Apxg + Baws and yr = 


“My invention, (the motion picture camera), can be exploited for a certain time as a scientific curiosity, 
but apart from that it has no commercial value whatsoever.” August Marie Louis Nicolas Lumiere 
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Cr4xx. Monzingo modified the Fraser-Potter smoother solution so that each 
measurement is only used once. Rauch, Tung and Striebel employed the 
maximum-likelihood method to derive their fixed-interval smoother in which G; = 


PAL P.\, is a gain matrix. Although this is not a minimum-mean-square-error 
solution, it outperforms the Kalman filter and can provide close to optimal 


performance whenever the underlying Gaussian-distribution assumptions are 
reasonable. 


The minimum-variance smoothers are state-space generalisations of the optimal 
noncausal Wiener solutions. They make use of the inverse of the approximate 
Wiener-Hopf factor Av! and its adjoint A~” . These smoothers achieve the best- 
possible performance, however, they are not minimum-order solutions. 
Consequently, any performance benefits need to be reconciled against their 
increased complexity. 


7.7 Problems 


Problem 1. 
(i) Simplify the fixed-lag smoother 
[ Keak | | A, Q as? Ol Kei | [ Kox ] 
Kei I, 9 OF Xie Ky 
Sein [=| 9 I, || Reais | Ky (2p -O Xa) 
LX i _wave | 0 0 I, Ol Xie] | Kn | 


to obtain an expression for the components of the smoothed state. 
(ii) Derive expressions for the two predicted error covariance submatrices of 
interest. 


Problem 2. 
(i) With the quantities defined in Section 4 and the assumptions X,,,,y ~ 
NA Spy, BOB)» Syy ~ M(%& x» P.,), use the maximum-likelihood 
method to derive 
Sain = (1+P,,4; (BOQ,B YA) Sen +P 5A, BOB) Sun). ¢ 


(i) Use the Matrix Inversion Lemma to obtain Rauch, Tung and Striebel’s 
smoother 


Kein = Xen + Gy Sari — Mesa) - 


“What makes photography a strange invention is that its primary raw materials are light and time.” 
John Berger 
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(ii) Employ the additional assumptions E{x,,,%/,,} =0, E{% ay Xv} and 


x aT A aT = T a 

EXXpaw%piwt to show that E{XpupXpauh = Berend Pees 
a aT a, T — = 

EX ines = ERX Xpat Zev aNd Dy = Py 


G, (Pease = Deer ‘ 
(iii) Use Ge = AU — B,O,ByPoiy) and Zpaiy = 
confirm that the smoothed estimate within 


Nesite S A, B,O,Bi Neaihy at 0 

Aun aC, RG, Ay Aesin CER, 4 
is equivalent to Rauch, Tung and Striebel’s maximum-likelihood 
solution. 


+ P 


Xeatike rik Anan tO 


Problem 3. Let a= G, w denote the output of linear time-varying system having 
the realisation xi+1 = Asx + Wi, pe =x. Verify that P, — 4,P.A4, = APG” + 
GPA + GPG". 

Problem 4. For the model (1) — (2). assume that D; = 0, E{we} = E{vs} = 0, 
Efwwi} = 0,64, Efvyj} = Rb, and E{w,yv,} = 0. Use the result of 
Problem 3 to show that AA” = AA” — CG (Py -— Pay) Gi'C. 


Problem 5. Under the assumptions of Problem 4, obtain an expression relating 
AA" and AA” for the case where D; #0. 


Problem 6. From ®@, = -[2 HG,-G], RR! = RR! + RR! and 


R,. = HA — GOOG A", obtain the optimal realisable smoother solutions for 
output estimation, input estimation and state estimation problems. 


7.8 Glossary 


P(xe) Probability density function of a discrete random variable 
Xk. 

x, ~M(uR,,) The random variable x, has a normal distribution with 
mean yw and covariance Rx. 

Qn, Error covariance of the fixed-point smoother. 

Pe Error covariance of the fixed-lag smoother. 

Wrens Nery Day Estimates of wz, x, and y, at time k, given data z, over an 


“To invent, you need a good imagination and a pile of junk.” Thomas Alva Edison 


198 


7.9 


[1] 
[2] 
[3] 
[4] 


[5] 
[6] 


[7] 


[8] 


[9] 


Chapter 7 Discrete-Time Minimum-Square-Error Filtering 


interval k € [0, NJ. 
Gain of the smoother developed by Rauch, Tung and 
Striebel. 


Output of A"', the inverse of the approximate Wiener- 
Hopf factor. 


Output of A", the adjoint of the inverse of the 
approximate Wiener-Hopf factor. 
Moore-Penrose pseudoinverse of Cy. 


A system (or map) that operates on the problem inputs 


v : : . : 
i -| to produce an estimation error e. It is convenient 
WwW 


to make use of the factorisation 7,72 = 2,7, + 
RR, where #72; 


ei2” “ei2 ? ei2* “ei2 


includes the filter/smoother 


solution and 7,72," is a lower performance bound. 


eil* “eil 
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8. Parameter Estimation 


8.1 Introduction 


Predictors, filters and smoothers have previously been described for state recovery 
under the assumption that the parameters of the generating models are correct. 
More often than not, the problem parameters are unknown and need to be 
identified. This section describes some standard statistical techniques for 
parameter estimation. Paradoxically, the discussed parameter estimation methods 
rely on having complete state information available. Although this is akin to a 
chicken-and-egg argument (state availability obviates the need for filters along 
with their attendant requirements for identified models), the task is not 
insurmountable. 


The role of solution designers is to provide a cost benefit. That is, their objectives 
are to deliver improved performance at an acceptable cost. Inevitably, this requires 
simplifications so that the problems become sufficiently tractable and amenable to 
feasible solution. For example, suppose that speech emanating from a radio is too 
noisy and barely intelligible. In principle, high-order models could be proposed to 
equalise the communication channel, demodulate the baseband signal and recover 
the phonemes. Typically, low-order solutions tend to offer better performance 
because of the difficulty in identifying large numbers of parameters under low- 
SNR conditions. Consider also the problem of monitoring the output of a gas 
sensor and triggering alarms when environmental conditions become hazardous. 
Complex models could be constructed to take into account diurnal pressure 
variations, local weather influences and transients due to passing vehicles. It often 
turns out that low-order solutions exhibit lower false alarm rates because there are 
fewer assumptions susceptible to error. Thus, the absence of complete information 
need not inhibit solution development. Simple schemes may suffice, such as 
conducting trials with candidate parameter values and assessing the consequent 
error performance. 


In maximum-likelihood estimation [1] — [5], unknown parameters 67, 42, ..., Ou, 
are identified given states, x;, by maximising a log-likelihood function, log f(@:, 42, 
.., 9, | x,). For example, the subject of noise variance estimation was studied by 


Mehra in [6], where maximum-likelihood estimates (MLEs) were updated using 


“The sciences do not try to explain, they hardly even try to interpret, they mainly make models. By a 
model is meant a mathematical construct which, with the addition of certain verbal interpretations, 
describes observed phenomena. The justification of such a mathematical construct is solely and 
precisely that it is expected to work” John Von Neuman 
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the Newton-Raphson method. Rife and Boorstyn obtained Cramér-Rao bounds for 
some MLEs, which “indicate the best estimation that can be made with the 
available data” [7]. Nayak et al used the pseudo-inverse to estimate unknown 
parameters in [8]. Belangér subsequently employed a least-squares approach to 
estimate the process noise and measurement noise variances [9]. A recursive 
technique for least-squares parameter estimation was developed by Strejc [10]. 
Dempster, Laird and Rubin [11] proved the convergence of a general purpose 
technique for solving joint state and parameter estimation problems, which they 
called the expectation-maximisation (EM) algorithm. They addressed problems 
where complete (state) information is not available to calculate the log-likelihood 
and instead maximised the expectation of log f(@,6,,....0,,|2,), given 


incomplete measurements, z;. That is, by virtue of Jensen’s inequality the 
unknowns are found by using an objective function (which are also called an 
approximate log-likelihood function), E{log f(6,,6,,....9,, |Z,)}, aS a surrogate 


for log f(@, 92, ..., A, | x,)- 


The system identification literature is vast and some mature techniques have 
evolved. Subspace identification methods have been developed for general 
problems where a system’s stochastic inputs, deterministic inputs and outputs are 
available. The subspace algorithms [12] — [14] consist of two steps. First, the order 
of the system is identified from stacked vectors of the inputs and outputs. Then the 
unknown parameters are determined from an extended observability matrix. 


Continuous-time maximum-likelihood estimation has been mentioned previously. 
Here, the attention is focussed on the specific problem of joint state and parameter 
estimation exclusively from discrete measurements of a system’s outputs. The 
developments proceed as follows. Section 8.2 reviews the maximum-likelihood 
estimation method for obtaining unknown parameters. The same estimates can be 
found using the method of least squares, which was pioneered by Gauss for fitting 
astronomical observations. Well known (filtering) EM algorithms for variance and 
state matrix estimation are described in Section 8.3. Improved parameter 
estimation accuracy can be obtained via smoothing EM algorithms, which are 
introduced in Section 8.4. 


Although EM algorithms can yield improved state matrix and input process 
covariance estimates, it has been found that they are only accurate when the 
measurement noise is negligible. Similarly in subspace identification, least- 
squares estimation of unknown state-space matrices can lead to biased results 
when the states are corrupted by noise. This arises because the standard least- 
squares and maximum-likelihood estimates of a state matrix and an input process 
covariance themselves are biased in the presence of measurement noise. Therefore, 
a correction term is introduced in Section 8.5 to eliminate the bias error. This 
yields unbiased, consistent, closed-form estimates of a state matrix, an input 


“A hen is only an egg's way of making another egg.” Samuel Butler 
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process covariance and a measurement noise covariance. Under simplifying 
conditions they are equal to MLEs and attain the corresponding Cramer-Rao 
Lower Bounds (CRLBs). 


The use of the MLEs, filtering and smoothing EM algorithms discussed herein 
require caution. When perfect information and sufficiently large sample sizes are 
available, the corresponding likelihood functions are exact. However, the use of 
imperfect information leads to approximate likelihood functions and biased MLEs, 
which can degrade parameter estimation accuracy and follow-on filter/smoother 
performance. 


8.2. Maximum-Likelihood Estimation 


8.2.1 General Method 


Let p(@|xx) denote the probability density function of an unknown parameter 8, 


given samples of a discrete random variable x, An estimate, 6, can be obtained 
by finding the argument 0 that maximises the probability density function, that is, 


0 = arg max = p(@|x,). (1) 
0 


= 0 and solving for the unknown 0. 


A solution can be found by setting ao 


Since the logarithm function is monotonic, a solution may be found equivalently 
by maximising 


6= arg max = log p(@| x, ) (2) 
0 


= 0. For exponential families of distributions, the use 


and setting ae 


of (2) considerably simplifies the equations to be solved. 
Suppose that NV mutually independent samples of x, are available, then the joint 
density function of all the observations is the product of the densities 


FO1x,) = p(O1x,)p(8 |x) PAL xy) 
=T]p@lx,). 3) 


which serves as a likelihood function. The MLE of 6 may be found maximising 
the log-likelihood 


“We are like eggs at present. And you cannot go on indefinitely being just an ordinary egg. We must 
hatch or go bad.” Clive Staples Lewis 
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6 =argmax log f(O|x,) 
0 


N 
= argmax > p@ | x,) (4) 
9 k=l 


N 
Olo O\x 
dlog f(@|x,) _ (BL Pl) 


00 00 
maximum-likelihood approach is applicable to a wide range of distributions. For 
example, the task of estimating the intensity of a Poisson distribution from 
measurements is demonstrated below. 


= (0. The above 


by solving for a @ that satisfies 


Example 1. Suppose that N observations of integer x; have a Poisson distribution 
F(%) = 
x,! 


likelihood function is 


eh u 


, where the intensity, uw, is unknown. The corresponding log- 


e” XY e” a7) e” x3 e” XN 
log Ft) =o ec sr ot 
el as oe Ky! 
1 
= log | Vine aa 
X,!X_ boxy! 
=—log(x,!x,!--xy )+log(a ee  )y- Nw. (5) 
] N 
Tana el 1S x, N =0 yields 
Ou Mil 
, 1< (6) 
oe te fae 
H me k 
at N 
Since g ae a zs ee is negative for all uw and x, > 0, the stationary 
k=1 


point (6) occurs at a maximum of (5). That is to say, # is indeed a maximum- 
likelihood estimate. 


“Therefore I would not have it unknown to Your Holiness, the only thing which induced me to look for 
another way of reckoning the movements of the heavenly bodies was that I knew that mathematicians 
by no means agree in their investigation thereof.” Nicolaus Copernicus“ 
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8.2.2 State Matrix Estimation 


From the Central Limit Theorem, which was mentioned in Chapter 6, the mean of 
a sufficiently large sample of independent identically distributed random variables 
will asymptotically approach a normal distribution. Consequently, in many 
maximum-likelihood estimation applications it is assumed that random variables 
are normally distributed. Recall that the normal (or Gaussian) probability density 
function of a discrete random variable x; with mean w and covariance Ry 1s 


P(%) = = 1/72 ep| ! (x, My Re (x; 1}. (7) 
(27)"”? |R,., 2 


in which [Rus 
of N independently identically distributed random variables is 


denotes the determinant of R,,. A likelihood function for a sample 


Z 1 v 1 
= | | of > —~ (x, — yu) R(x, - 8 
T(x) LJ P(x) Qn)? IR, WD L. ep| 5) (x, —) Roy 1}. (8) 


In general, it is more convenient to work with the log-likelihood function 


log f(x) =-log (22)"?|R,, sre ~ 1)! R24, - 0). (9) 


N/2 


An example of estimating a model coefficient using the Gaussian log-likelihood 
approach is set out below. 


Example 2. Consider an autoregressive order-one process x4+1 = aor. + we, with 
E{w.} = o%, in which it is desired to estimate ao ¢ R from samples of x;. It 


w? 


follows from xn1~ M(a,x,, 02) that 


w 


1 N 
log f (a | %4:) =—log (22)" 0% a? Gap Gik,) 
k=1 


_ Ol ‘ “ ~ ; 
Setting EE equal to zero gives Dyer. 7 a yx; which results 
Ao k=l k=l 
in the MLE 


N 
Dee 
kal 


0 N. 
k= 


xe 
x, 


“How wonderful that we have met with a paradox. Now we have some hope of making progress.” 
Niels Henrik David Bohr 
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Often within filtering and smoothing applications there are multiple parameters to 
be identified. Denote the unknown parameters by 61, 62, ..., Ou, then the MLEs 
may be found by solving the M equations 


Olog f(O,,,,°--Oy | x,) 


=0 
a8, 

Olog f(O,,9,,°- Oy |x,) =0 
a0, 

Olog f(O,,9,,°-- Oy |X,) =0 


0, 


A vector parameter estimation example is outlined below. 


Example 3. Consider the third-order autoregressive model 
X43 FAX p49 TAX) + AgX, = Wy (10) 


which can be written in the state-space form 


Mad Ta, “A, Ay || X Wy 
Xral=| 1 0 0 Ix, |+] 0 |. (11) 
X53 p41 0 1 0 |) x3, 0 


2 


Assuming X,,.; ~ M(-a,%,, —@,X), — 4X3, ,,) and setting to zero the partial 


derivatives of the corresponding log-likelihood function with respect to ao, ai and 
az yields 


N N N 7) [ _N 7 
2 
De: D Bi poe teee Sats 
k=l k=l k=l a k=l 
N N N N 
= X. xe X, ,X. a = S Xx. (12) 
X44. %3 4 2k 2 kX, k 1{= Lk+1%2,4 
k=l k=l k=l k=l 
N N N a, N 
2 
Des pe ea pa De eo 
| eal k=l k=l J L k=1 J 


Hence, the MLEs are given by 


“If we all worked on the assumption that what is accepted as true is really true, there would be little 
hope of advance.” Orville Wright 
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Sete 


[ oN N N 
2 
oar e > ee Dereee 
k=l k=l k=l 
N N N 
Ke Jae 2 
a, |=— eee Si > op 
k=l k=l k=l 
N N N 
2 
rer Pear pace 
L k=l kal k=l 


(13) 


8.2.3. Variance Estimation 


MLEs can be similarly calculated for unknown variances, as is demonstrated by 
the following example. 


Example 4. Consider a random variable generated by x, =u + wx where uw € R is 
fixed and wz € R is assumed to be a zero-mean Gaussian white input sequence. 
Since x.~ A’ (ut, o..), it follows that 


N N 1 NV 
log f(a, | x,) =-—log 2x -—log 0, - 0," ))(x, — 
2 2 2 k=l 
and 


dlog f(a, | x;) 
Oo" 


w 


Nona 1, aawt 2 
San) $5 (Gy) DB 
k=l 


dlog f(a, |x;) 
00° 


w 


From the solution of 


= 0, the MLE is 


N 
ge ye, —) , without replacement. (14) 
k=1 


If the random samples are taken from a population without replacement, the 
samples are not independent, the covariance between two different samples is 
nonzero and the MLE (14) is biased. If the sampling is done with replacement 
then the sample values are independent and the following correction applies 


N 
o. D(x —p) , with replacement. (15) 
~ 1 k=l 


The corrected denominator within the above sample variance is only noticeable 
for small sample sizes, as the difference between (14) and (15) is negligible for 


“In science one tries to tell people, in such a way as to be understood by everyone, something that no- 
one knew before. But in poetry, it’s the exact opposite.” Paul Adrien Maurice Dirac 
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large N. The MLE (15) is unbiased, that is, its expected value equals the variance 
of the population. To confirm this property, observe that 


Bio} = BY yan 1} 


1 “ 2 2, 
= E41—— ) x, -2ux, + 
woh MX, + Hl 


N 
= 57 Ee ED. (16) 
Using E{xy} = 0, + xX, E{EQy}} = Elon} + E{uy’, ELE Qg}} = Ely?} = 
Ww and Efo>} = o7/N within (16) yields E{o°} = o% as required. Unless 


stated otherwise, it is assumed herein that the sample size is sufficiently large so 
that N! = (N - 1)! and (15) may be approximated by (14). A caution about 
modelling error contributing bias is mentioned below. 


Example 5. Suppose that the states considered in Example 4 are actually 
generated by x, =“ + we + 5;, where s; is an independent input that accounts for 


the presence of modelling error. In this case, the assumption x, ~ MW (4, 
KB 1< f : . 
o. +0.) leads to 62 +67 = ye —)’ , in which case (14) is no longer an 
k=1 


unbiased estimate of o°. 


8.2.4 Cramér-Rao Lower Bound 


The Cramér-Rao Lower Bound (CRLB) establishes a limit of precision that can be 
achieved for any unbiased estimate of a parameter @. It actually defines a lower 


bound for the variance o; of @. As is pointed out in [1], since @ is assumed to 
be unbiased, the variance rea equals the parameter error variance. Determining 


lower bounds for parameter error variances is useful for model selection. Another 
way of selecting models involves comparing residual error variances [23]. A lucid 
introduction to Gaussian CRLBs is presented in [2]. An extensive survey that 
refers to the pioneering contributions of Fisher, Cramér and Rao appears in [4]. 


The bounds on the parameter variances are found from the inverse of the so-called 
Fisher information. A formal definition of the CRLB for scalar parameters is as 
follows. 


Theorem 1 (Cramér-Rao Lower Bound) [2] - [5]: Assume that /(@|x,) 
satisfies the following regularity conditions: 


“Everyone hears only what he understands.” Johann Wolfgang von Goethe 


G. A. Einicke, Smoothing, Filtering and Prediction: Estimating 209 
the Past, Present and Future (24 ed.), Prime Publishing, 2019 


dlog f1x,) Slog fA x) 
00 00” 


p{ owe tels| =0, forall 0. 


() 


exist for all 0, and 


(ii) 50 


Define the Fisher information by 


F(6)=-E a cells) (17) 


00” 


where the derivative is evaluated at the actual value of 0. Then the variance o; of 


an unbiased estimate 0 satisfies 
G2 FE (): (18) 


Proofs for the above theorem appear in [2] — [5]. 


Example 6. Suppose that samples of x; = uw + we are available, where w; is a zero- 
mean Gaussian white input sequence and w € R is unknown. Since wx ~ A/V (0, 


o,,)s 
N N > | 2 
log f(u | x,)=-—log2a-—log oy, -— 0," D(x, —#) 
2, 2 2 kal 
and 
Clog f(u|x,)_ ad 
Oy 2) 
Ou 2 : 
l 
Setting Slog JCEM) a |) =0 yields the MLE” 
Ml 


N N 
which is unbiased because E{f7} = eLo> af = a ya) = uw. From 


Theorem 1, the Fisher information is 


ru e| Pelle) -g40a,3|= Ney 
MU 


“Everyone has talent. What is rare is the courage to follow the talent to the dark place where it leads.” 
Erica Jong 
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and therefore 


o,20,/N. (20) 


The above inequality suggests that a minimum of one sample is sufficient to 
bound the variance of the MLE (19). It is also apparent from (20) that the error 
variance of w decreases with increasing sample size. 


The CRLB is extended for estimating a vector of parameters 61, 02, ..., Ou by 
defining the MxM Fisher information matrix 


noe] ees Ql) 


00.00, 


for i, 7 = 1... M. The parameter error variances are then bounded by the diagonal 
elements of Fisher information matrix inverse 


Oy 2B, (0): (22) 
Formal vector CRLB theorems and accompanying proofs are detailed in [2] — [5]. 


Example 7. Consider the problem of estimating both « and o~ from N samples of 
xe= + we with we~ A/(0, o2). Recall from Example 6 that 


N N Ds ee 
log f(u,02 | x,) =~ log2a— logo, — Son Da —p). 
k=l 


A log f(u,o, |x) _ 


Therefore, 5} 


2 N 
Olog f(M,0, |.) _ 02> (%,- aa 
Ou fal Ou 
Alog f(u,0;, |X;) 
0c? 


w 


-No. In Example 4 it is found that = S63)" + 


N 
2 Dey, — 1)’ , which implies 


k=l 


& log f (4,0; | x;,) 


ie ae ~ oy 
ary = 5 (Fw) (o;,) Da LL) 
= 2 (63)? (03) 


“Laying in bed this morning contemplating how amazing it would be if somehow Oscar Wilde and 
Mae West could twitter from the grave.” @DitaVonTeese 
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& log f(u,o% | x,) _ 
duo". 


Pieced) 9 


27 hy and ey fae 


The Fisher information matrix and its inverse are then obtained from (21) as 


No? 0 o/N 0 
Fiu,o° = w : F" u,o- = w . 
ey) 0 ae 2) 0 in 


It is found from (22) that the lower bounds for the MLE variances are 
o;, >o./N and oa >2o07/N. The impact of modelling error on parameter 


estimation accuracy is examined below. 
Example 8. Consider the problem of estimating a2 given samples of states which 


are generated by x, = w + we + sx, where sx, is an independent sequence that 
accounts for the presence of modelling error. From the assumption x, ~ A/(u, 


a. +0. ), the associated log likelihood function is 


dlog f(a, | x,) N 3 nal,» 2 2 
=-—(0,+0-)'+-(o. +0 —y , 
do? 2 ( w Ss ) 2 ( w Ss ) 2 (x; 4) 


& log f(o2 |x.) 
ay 


2(0° +0)’ / N.. Thus, parameter estimation accuracy degrades as the variance of 


: N e ; 
which leads to 5 (G46, ) > @ that is. 2 


the modelling error increases. If of is available a priori then setting 


dlog f(o,, | x,) 
0c? 


w 


= 0 leads to the improved estimate 


A 1< 
Gy, 7 -o; A —) (x; — py . 
Nia 


“There are only two kinds of people who are really fascinating; people who know absolutely 
everything, and people who know absolutely nothing.” Oscar Fingal O’Flahertie Wills Wilde 
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8.3. Filtering EM Algorithms 


8.3.1 Background 


The EM algorithm [3], [7], [11], [15] — [17], [19] -— [22] is a general purpose 
technique for solving joint state and parameter estimation problems. In maximum- 
likelihood estimation, it is desired to estimate parameters 0), 02, ..., Ou, given 
states by maximising the log-likelihood log f(@:, 2, ..., A, | x,). When complete 
state information is not available to calculate the log-likelihood, the expectation of 
log f(@,,9,,...,9,, |x,), given incomplete measurements, z;, is maximised instead. 


This basic technique was in use prior to Dempster, Laird and Rubin naming it the 
EM algorithm 1977 [11]. They published a general formulation of the algorithm, 
which consists of iterating an expectation step and a maximisation step. Their 
expectation step involves least squares calculations on the incomplete 
observations using the current parameter iterations to estimate the underlying 
states. In the maximisation step, the unknown parameters are re-estimated by 
maximising a joint log likelihood function using state estimates from the previous 
expectation step. This sequence is repeated for either a finite number of iterations 
or until the estimates and the log likelihood function are stable. Dempster, Laird 
and Rubin [11] also established parameter map conditions for the convergence of 
the algorithm, namely that the incomplete data log likelihood function is 
monotonically nonincreasing. 


Wu [16] subsequently noted an equivalence between the conditions for a map to 
be closed and the continuity of a function. In particular, if the likelihood function 
satisfies certain modality, continuity and differentiability conditions, the parameter 
sequence converges to some stationary value. A detailed analysis of Wu’s 
convergence results appears in [3]. Shumway and Stoffer [15] introduced a 
framework that is employed herein, namely, the use of a Kalman filter within the 
expectation step to recover the states. Feder and Weinstein [17] showed how a 
multiparameter estimation problem can be decoupled into separate maximum 
likelihood estimations within an EM algorithm. Some results on the convergence 
of EM algorithms for variance and state matrix estimation [19] — [20] are included 
within the developments below. 


8.3.2 Measurement Noise Variance Estimation 


8.3.2.1 EM Algorithm 


The problem of estimating parameters from incomplete information has been 
previously studied in [11] — [16]. It is noted in [11] that the likelihood functions 
for variance estimation do not exist in explicit closed form. This precludes 


“lm no model lady. A model is just an imitation of the real thing.” Mary (Mae) Jane West 
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straightforward calculation of the Hessians required in [3] to assert convergence. 
Therefore, an alternative analysis is presented here to establish the monotonicity 
of variance iterations. 


The expectation step described below employs the approach introduced in [15] 
and involves the use of a Kalman filter to obtain state estimates. The maximisation 
step requires the calculation of decoupled MLEs similarly to [17]. Measurements 
of a linear time-invariant system are modelled by 


Xp = AX, + Bw, , (23) 
y, = Cx, + Dw,, (24) 
2 = Vt» (25) 


where 4 € R”", Be R””, Ce R’”, De R?™” and wes ve are stationary 
processes with E{w,} = 0, E{w;w,} = Q6,, Efy,} = E{wyj} = 0 and 
Etv Ae = R6d,,. To simplify the presentation, it is initially assumed that the 


direct feed-through matrix, D, is zero. A nonzero D will be considered later. 


Suppose that it is desired to estimate R = diag( G, 0a. ae oy) given N 


samples of z; and yx. Let zix, vie and viz denote the i" element of the vectors zx , yx 
and v;, respectively. Then (25) may be written in terms of its i components, 2;,4 = 
Vik st Viiks that is, 


Vik = Fig ~ Vik + (26) 


From the assumption vie ~ A/(0, o;,), an MLE for the unknown o;,, is 


obtained from the sample variance formula 


N 
a. = Ly Gi Wik y ¢ (27) 
Ni 


An EM algorithm for updating the measurement noise variance estimates is 


described as follows. Assume that there exists an estimate R“ = diag((6()’, 


(Ge) shes Gory") of R at iteration uv. A Kalman filter designed with R“ may 


then be employed to produce corrected output estimates }{"). The filter’s design 
Riccati equation is given by 


“There are known knowns. These are things we know that we know. There are known unknowns. That 
is to say, there are things that we know we don’t know. But there are also unknown unknowns. There 
are things we don’t know we don’t know.” Donald Henry Rumsfeld 
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Pipe AK POPE CARO! + KORO KOY +808"; (28) 


where K = AP“ CT(CP\).CT + R)" is the predictor gain. The output 
estimates are calculated from 


ae Ki"C) eles ] (29) 
Kin | LU-£PC) LP jl % 

j) = Ca, (30) 
where ZL“ = P“ C’(CP .C’ + RY" is the filter gain. 


Procedure I [19]. Assume that an initial estimate R® of R is available. 
Subsequent estimates, R, u > 1, are calculated by repeating the following two- 
step procedure. 


Step 1. Operate the Kalman filter (29) — (30) designed with R“ to obtain 


corrected output estimates }\") . 


Step 2. For i= 1, ..., p, use 5,/) instead of ye within (27) to obtain R“) = 
diag (BE), (GLY 2s E0)?). 


8.3.2.2 Properties 


The above EM algorithm involves a repetition of two steps: the states are deduced 
using the current variance estimates and then the variances are re-identified from 
the latest states. Consequently, a two-part argument is employed to establish the 
monotonicity of the variance sequence. For the expectation step, it is shown that 
monotonically non-increasing variance iterates lead to monotonically non- 
increasing error covariances. Then for the maximisation step, it is argued that 
monotonic error covariances result in a monotonic measurement noise variance 
sequence. The design Riccati difference equation (28) can be written as 


Pisin = (Aq Ky C) Py (A- KOC)’ + KO R(KA)' +482”, (G1) 


where S = K“)(R™ —R)(K()! accounts for the presence of parameter error. 


Subtracting x, from 2") yields 


Hie = UL LPO) LY 22) 


“T want minimum information given with maximum politeness.” Jacqueline (Jackie) Lee Bouvier 
Kennedy Onassis 
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where Xj") =x, — 20, and X(")., =x, — X{")., are the corrected and predicted 


state errors, respectively. The observed corrected error covariance is defined as 
IM) = E(x) ())"} and obtained from 


Dein = LLP C)E LPC)" + LP RLY” 


=_y) ® CT”CS CT 4 RYICSM 

= pia ~ Lea (CU +R) Crna, (33) 

where D(")_, = E{xt") (X{2_,)}. The observed predicted state error satisfies 
xa = Ai, + By, . (34) 


Hence, the observed predicted error covariance obeys the recursion 
ie = AXA + BOB" . (35) 


Some observations concerning the above error covariances are described below. 
These results are used subsequently to establish the monotonicity of the above EM 
algorithm. 


Lemma 1 [19]: In respect of Procedure 1 for estimating R, suppose the following: 
(i) the data z, has been generated by (23) — (25) in which A, B, C, O are 


A,(A)| <J,i=1, ...,n, and the pair (A, ©) is observable; 
(ii) there exist P? < P” and R< R® < R® (or P® < P® and R® < 


1/0 1/0 
RO SR): 


known, 


Then: 
a (u) (uy, 
@ Zan = Bae 


(ii) Ein SPs 


klk ? 
(iii) R< R° < R® implies P“*) < P®, (or R® < R“ <R implies 
(u) (w+) 
Peek = Pek 
for allu > 1. 
Proof: 
(i) Condition (i) ensures that the problem is well-posed. Condition (ii) 


stipulates that S{? > 0, which is the initialisation step for an induction 


argument. For the inductive step, subtracting (33) from (31) yields Pi 


k+i/k 


“The most technologically efficient machine that man has ever invented is the book.” Northrop Frye 
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(u) = (u) (u) (u) (u) yt (u) 
— tee = (A - KP OOP ia — Yen a4 — KE CY + SY and thus 
(u) (u) ‘ . (u) (u) 
Lea = Aa implies Tein S Pein: 


(ii) The result is immediate by considering A = I within the proof for (i). 


(iii) The condition Re < RY ensures that 
AD A 
e . < 2 , , Which together with 
A —C7(R@Y)"C A -C7(R™)'C 


(u+1) (wu) a ° (u+1) (uw) 
PY? < Bi’ within Theorem 2 of Chapter 7 results in PM) < PS), . 


Thus the sequences of observed prediction and correction error covariances are 
bounded above by the design prediction and correction error covariances. Next, it 
is shown that the observed error covariances are monotonically non-increasing (or 
non-decreasing). 


Lemma 2 [19]: Under the conditions of Lemma 1: 


; (u+1) (uw) (u) (u+1) 

i) Deak Lesve (OF Desi, S Veh, ) and 
ae (u+1) (u) (u) (u+1) 

ii) LenS Ley (Or Dey S Lee): 


Proof: To establish that the solution of (33) is monotonic non-increasing, from 
Theorem 2 of Chapter 7, it is required to show that 


O+ KM POR(KE?Y (A-KU Co)" 3 O+K\OR(KM) (A-K(c)y 
A-K“C 0 “| A-K®C 0 , 


Since A, Q and R are time-invariant, it suffices to show that 
(u+l) 7 7(u+l)\T put) AT 
E (Py ULE c) | (36) 


| Ee YU LPC)" 
1-LePC 0 


:| IT-LPC 0 


Note for an X and Y satisfying I= Y =X = 0 that YY" - XX’ > (I- X)(I— X)" — (I 
Y)(I — Y)". Therefore, R““? < R and P&*) < P“. (from Lemma 1) imply 
LC < LC < Land thus (36) follows. 


It is established below that monotonic non-increasing error covariances result in a 
monotonic non-increasing measurement noise variance sequence. 


Lemma 3 [19]: In respect of Procedure I for estimating R, suppose the following: 
(i) the data z, has been generated by (23) — (25) in which A, B, C, O are 


known, |A, (A)| <J1,i=1, ...,n and the pair (A, C) is observable; 


“The Internet is the world’s largest library. It’s just that all the books are on the floor.” John Allen 
Paulos 


G. A. Einicke, Smoothing, Filtering and Prediction: Estimating 217 
the Past, Present and Future (24 ed.), Prime Publishing, 2019 


(ii) there exist R° >R>Oand BP“ < P“ (or RP < P“* ) for all u > 
1. 
Then R* < R® (or R® < R“) for allu> 1. 


Proof: Let C; denote the i" row of C. The approximate MLE within Procedure 1 is 
written as 


. ie : 
600" == Ye, Ca) 67) 
k=l 
1 N 
== DCH +) = 
, (39) 


= (u) AT 2 
= CC; +0;,, 


and thus R“*? = CXCT + R. Since R* is affine to X., which from 


Lemma 2 is monotonically non-increasing, it follows that ROO < RO. 


If the estimation problem is dominated by measurement noise, the measurement 
noise MLEs converge to the actual values. 


Lemma 4 [19]: Under the conditions of Lemma 3, 


lim ROM=R., (40) 


0>0,R!30,u>00 


Proof: By inspection of I = P\ C'(CP.C’ + RY", it follows that 
lim LY = 0. Therefore, lim 9) =Oand lim y% =v, 


0>0,R! 0,u00 00,R |! 0,u—00 0->0,R7! 0 


which implies (40), since the MLE (37) is unbiased for large N. 


“Technology is so much fun but we can drown in our technology. The fog of information can drive out 
knowledge.” Daniel Joseph Boostin 
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12 
(i), i) 
10 

: 4 (iii) 
@ 
. (wv) 
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m 
Fig. 1. Variance MLEs (27) versus iteration number for Example 9: (i) EM algorithm with (6 a a = 
14, (ii) EM algorithm with (6)? = 12, (iii) Newton-Raphson with (6)? = 14 and (iv) Newton- 


Raphson with (6)? = 12, 


Example 9. In respect of the problem (23) — (25), assume A = 0.9, B= C = 1 and 
o. =0.1 are known. Suppose that o? = 10 but is unknown. Samples z and %\") 
were generated from N = 20,000 realisations of zero-mean Gaussian wx and vx. 
The sequence of MLEs obtained using Procedure 1, initialised with (6)? = 14 


and 12 are indicated by traces (i) and (ii) of Fig. 1, respectively. The variance 
sequences are monotonically decreasing, which is consistent with Lemma 3. The 
figure shows that the MLEs converge (to a local maximum of the approximate 


log-likelihood function), and are reasonably close to the actual value of o? = 10. 


This illustrates the high measurement noise observation described by Lemma 4. 
An alternative to the EM algorithm involves calculating MLEs using the Newton- 
Raphson method [5], [6]. The calculated Newton-Raphson measurement noise 


variance iterates, initialised with (6°)? = 14 and 12 are indicated by traces (iii) 


and (iv) of Fig. 1, respectively. It can be seen that the Newton-Raphson estimates 
converge to those of the EM algorithm, albeit at a slower rate. 


8.3.3. Process Noise Variance Estimation 


8.3.3.1 EM Algorithm 


In respect of the model (23), suppose that it is desired to estimate O given N 
samples of x;+1. The vector states within (23) can be written in terms of its 7 


components, x, ,,, = 4,x, +W;,,, that is 


Wie = Ax, TX kat» (41) 


“Getting information off the internet is like taking a drink from a fire hydrant.” Mitchell David Kapor* 
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where wix = B Wis A; and B; refer the i” row of A and B, respectively. Assume that 
wir ~ M(0, o.,), where oy € R is to be estimated. An MLE for the scalar 


o;,, = BOB! can be calculated from the sample variance formula 


ty 


i,w 


o. = aM (42) 


= ve (441 ~— A % MX ear — AX ‘i (43) 
“5 DB ww, B (44) 


-3 [= Ym! ie (45) 
Substituting w, = Ax, — x4+1 into (45) and noting that a = B.OB’ yields 


Ly (4 - Xp41 (AX, - yale (46) 


Nie 1 


which can be updated as follows. 


Procedure 2 [19]. Assume that an initial estimate ol) of Q is available. 
Subsequent estimates can be found by repeating the following two-step algorithm. 
Step 1. Operate the filter recursions (29) designed with O on the 


measurements (25) over k € [1, N] to obtain corrected state estimates 
(uw) (wu) 
Kip AN Xe ies + 


Step 2. Fori=1,...,n, use X{") and %),,,, instead of x, and x;+) within (46) to 


obtain GO" = dig (62), BEY, os (EP), 


w Oo, w 


8.3.3.2 Properties 


Similarly to Lemma 1, it can be shown that a monotonically non-increasing (or 
non-decreasing) sequence of process noise variance estimates results in a 
monotonically non-increasing (or non-decreasing) sequence of design and 
observed error covariances, see [19]. The converse case is stated below, namely, 
the sequence of variance iterates is monotonically non-increasing, provided the 


“Information on the Internet is subject to the same rules and regulations as conversations at a bar.” 
George David Lundberg 


220 Chapter 8 Parameter Estimation 


estimates and error covariances are initialised appropriately. The accompanying 
proof makes use of 


“(u) a(u) _ o(u) (u) “(u) “(u) 
Kiss ~ Axiie = Xen + Le Za — in) — Ah 


— gol) a aco) i AU) 
= AX.) + Di pat (Zp) — hein) — Adi 


= jhee (Stn +Vp41)- (47) 


The components of (47) are written as 


a (uw) a(u) _ 7(u) ~(u) 
Xi gsiee — F Xe = Lipa (Xp tM eat) (48) 
where Lis the i” row of L” 
i,k+1 k+1 ° 


Lemma 5 [19]: In respect of Procedure 2 for estimating Q, suppose the 
following: 

(i) the data z, has been generated by (23) — (25) in which A, B, C, R are 
A, (A)| <J1,i=1,...,n and the pair (A, C) is observable; 


ii) there exist O° >O >Oand P*? < P (or P < P“™”) for allu> 
1/0 1/0 1/0 1/0 
Le 
Then 0? < OM (or O < O") for allu> 1. 


known, 


Proof: Using (47)within (46) gives 


N 
= LR (CoC! + RL) (49) 


2 
1 N 
A(u)\2 _ (u) fe (u) \T 
(G;", = ee Sea Pyeanr FV) (Lie 
k=l 


and thus O"* = L (Cx CT + RYL.)". Since O'"* varies with 
(u) (p(w) 
Le (Lj 


okt 


it follows that O""? < OM. 


T Fs F : ‘ 
Y’ and &\,,, which from Lemma 2 are monotonically non-increasing, 


It is observed that the approximate MLEs asymptotically approach the actual 
values when the SNR is sufficiently high. 


“T must confess that I’ve never trusted the Web. I’ve always seen it as a coward’s tool. Where does it 
live? How do you hold it personally responsible? Can you put a distributed network of fibre-optic cable 
on notice? And is it male or female? In other words, can I challenge it to a fight?” Stephen Tyrone 
Colbert 
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Lemma 6 [19]: Under the conditions of Lemma 5, 


lim OYW%=Q. (50) 


O'-0,R30,u 0 
Proof: It is straight forward to show that lim L,,C = I and therefore 
Q'>0,R30 ° 


lim RM) = x, , which implies (50), since the MLE (46) is unbiased for 


QO"! 0,R-0,u>00 


large N. 


Example 10. For the model described in Example 8, suppose that o? = 0.01 is 


known, and (6“)? = 0.1 but is unknown. Procedure 2 and the Newton-Raphson 
method [5], [6] were used to jointly estimate the states and the unknown variance. 
Some example variance iterations, initialised with (6)’ = 0.14 and 0.12, are 
shown in Fig. 2. The EM algorithm estimates are seen to be monotonically 
decreasing, which is in agreement with Lemma 5. At the final iteration, the 


approximate MLEs do not quite reach the actual value of (6“”)’ = 0.1, because 


the presence of measurement noise results in imperfect state reconstruction and 
introduces a small bias (see Example 5). The figure also shows that MLEs 
calculated via the Newton-Raphson method converge at a slower rate. 


0.11 


(0) 
0.1 (ii) 


g ooo} “) 
£ (wy 
> 


0.06 
1 2 3 4 5 6 


m 


Fig. 2. Variance MLEs (46) versus iteration number for Example 10: (1) EM algorithm with (6 id y 


= 0.14, (ii) EM algorithm with (6)? = 0.12, (iii) Newton-Raphson with (6)? = 0.14 and (iv) 


Newton-Raphson with (Cone = 0.12. 


Example 11. Consider the problem of calculating the initial alignment of an 
inertial navigation system. Alignment is the process of estimating the Earth 
rotation rate and rotating the attitude direction cosine matrix, so that it transforms 
the body-frame sensor signals to a locally-level frame, wherein certain 
components of accelerations and velocities approach zero when the platform is 


“Four years ago nobody but nuclear physicists had ever heard of the Internet. Today even my cat, 
Socks, has his own web page. I’m amazed at that. I meet kids all the time, been talking to my cat on the 
Internet.” William Jefferson (Bill) Clinton 
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stationary. This can be achieved by a Kalman filter that uses the model (23), 
where x; € R* comprises the errors in earth rotation rate, tilt, velocity and 


position vectors respectively, and w, € R* is a deterministic signal which is a 
nonlinear function of the states (see [24]). The state matrix is calculated as A =J + 


OT + (ory + Zn), where 7; is the sampling interval, O = 


0 0 0 0 
100 0] , : . . F : 
is a continuous-time state matrix and g is the universal 
0g 0 0 
00 1 0 


gravitational constant. The output mapping within (24) is C = [0 0 0 1). Raw 


three-axis accelerometer and gyro data was recorded from a stationary Litton 
LN270 Inertial Navigation System at a 500 Hz data rate. In order to generate a 
compact plot, the initial variance estimates were selected to be 10 times the steady 
state values. 


© 


= 


Normalized Variances 
a 


|Earth rate] [rad/sec] 


A 


3 


0 20 40 60 80 100 


é 1 A2 bay, AL. wen AD Fig. 4. Estimated magnitude of Earth rotation 
Fig. 3. (i) Or (ii) Or w> (iii) O35 and ae for Reape: 11 gn 
(iv) & A w» normalised by their steady state 
values, versus EM iteration number for 
Example 11. 


The estimated variances after 10 EM iterations are shown in Fig. 3. The figure 
demonstrates that approximate MLEs (46) approach steady state values from 
above, which is consistent with Lemma 5. The estimated Earth rotation rate 
magnitude versus time is shown in Fig. 4. At 100 seconds, the estimated 
magnitude of the Earth rate is 72.53 micro-radians per second, that is, one 
revolution every 24.06 hours. This estimated Earth rate is about 0.5% in error 
compared with the mean sidereal day of 23.93 hours [25]. Since the estimated 
Earth rate is in reasonable agreement, it is suggested that the MLEs for the 
unknown variances are satisfactory (see [19] for further details). 


“On the Internet, nobody knows you’re a dog.” Peter Steiner 
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8.3.4 State Matrix Estimation 


8.3.4.1 EM Algorithm 


The components of the states within (23) are now written as 


Xia = Aki + Wik > (51) 


where a,; denotes the element in row i and column j of A. Consider the problem of 
estimating a;, from samples of x;,. The assumption x;.1. ~ NV Qj Xin> a, .,) : 


j=l 


leads to the log-likelihood 


log £4) |X) 


2 
N N 1 N n 
eS log 2x 5 logo; ee Sa. (52) 
ij 


Olog £64) 1% 44) 


By setting r = 0, an MLE for a;,; is obtained as [20] 
qj 
N 
Ff y a; ;%, wa 
ES ts k=l f=) j#i ‘ (53) 


Le 


N 
2 
ear 
k=l 


Incidentally, the above estimate can also be found using the least-squares method 


k=l 


2 
[2], [10] and minimising the cost function (+ Xi gat =e wa): The 


j=l 


expectation of a, , is [20] 


N n 
S (Seat ma y a; jx na} 
k=l 


=1 Jal jFi 


2 
Dae 


k=1 


E{a, = E 


“Tt’s important for us to explain to our nation that life is important. It’s not only the life of babies, but 
it’s life of children living in, you know, the dark dungeons of the internet.” George Walker Bush 
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Epes 


Since w;,and x, are independent. Hence, the MLE (53) is unbiased. 


Suppose that an estimate AM = {ay} of A is available at an iteration u. The 
predicted state estimates within (29) can be calculated from 


s(u) Au) (u) “(u) (u) 
Kee = (AM — KO) xen + KX» (54) 


where K( = AP C"(CP\..CT + R)', in which P“, is obtained from the 
design Riccati equation 


Py = (A® ~KPO)PY (AP KP CY + KORKPY +0. (55) 
An approximate MLE for a; is obtained by replacing x, by %(") within (53) which 
results in 
N n 
>: mia ~ > ay OE 
A(utl) _ k=l jaLj#i 
ij W ; ; (56) 
> ay ) 
k=1 


An iterative procedure for re-estimating an unknown state matrix is proposed 
below. 


Procedure 3 [20]. Assume that there exists an initial estimate A satisfying 
|2,(A%)| < 1, i = 1, ..., 2. Subsequent estimates are calculated using the 


following two-step EM algorithm. 
Step 1. Operate the Kalman filter (29) using (54) on the measurements z; over k 


€ [1, N] to obtain corrected state estimates £(") and £0... 


Step 2. Copy A” to A“. Use £“? within (56) to obtain candidate estimates 


a") i,7=1,...,n. Include 4"? within A“? if |A(A“?)| <1, i= 
1,..., 7. 
The condition [A,(A@)| <1 within Step 2 ensures that the estimated system is 


asymptotically stable. 


“The internet is like a gold-rush; the only people making money are those who sell the pans.” Will 
Hobbs 
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8.3.4.2 Properties 


The design Riccati difference equation (55) can be written as 
Pin =(A~ KOO) Pip (A-KpOCY + KP R(KPY +O+S,”, (57) 


where 
So? = (4 —KPO)RD (A - KPcy" 
= CASK CES AKC) (58) 


accounts for the presence of modelling error. In the following, the notation of 
Lemma | is employed to argue that a monotonically non-increasing state matrix 
estimate sequence results in monotonically non-increasing error covariance 
sequences. 


Lemma 7 [20]. In respect of Procedure 3 for estimating A, suppose the following: 
(i) the data z, has been generated by (23) — (25) in which B, C, Q, R are 
known; 


(ii) |a,(A”) | <J,i=1,...,n, the pair (A, C) is observable; 


(iii) there exist A® >A and Pi) < P™ (or P\) < PP’) for allu> 1. 
Then: 
(i) yO < pw (or Poo<yu ) 
k+i/k — ~ k+i/k k+i/k = “*k+l/k 7? 
a) tu) (uw) (u) () ), 
(tt) Le S Feige (Or Pie S Lan: 
(iii) A“? < A® which implies P&? < P&, (A® < A“? which implies 
pM < pu) 


k+i/k — ~ k+i/k 


for allu > 1. 


The proof follows mutatis mutandis from that of Lemma 1. A heuristic argument 
is outlined below which suggests that non-increasing error variances lead to a non- 
increasing state matrix estimate sequence. Suppose that there exists a residual 


error s, € R" atiteration u such that 
au) FN SO) 4 gw) 
Xecvksr =A Xe TSE (59) 


The components of (59) are denoted by 


“It may not always be profitable at first for businesses to be online, but it is certainly going to be 
unprofitable not to be online.” Ester Dyson 
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“(u) = (u) s x) (u) 
Xi kak = Yaz Rkik T Sik > (60) 


j=l 


where s‘ is the 7" element of s("” . It follows from (60) and (48) that 


se = TCG: +V,41) (61) 


and 
si) = = EO + Vz 41). (62) 


Using (61) and (63) within (57) yields 


-1 
N 
a = ai) + (wu) A(t) A(u) 2 
2+(3 Sin Xi kik ie: Xi kik 


k=l 


N 
n ~(u) # ( n 2 (63) 
= a’? a zal > Xe HC Wee ee x Dek) Fis 


k=l 


where C* denotes the Moore-Penrose pseudo-inverse of C. It is shown in Lemma 2 
under prescribed conditions that L“*?C < LC < I. Since the non-increasing 
sequence LC is a factor of the second term on the right-hand-side of (63), the 


a(utl) 


sequence a;"’ is expected to be non-increasing. 


Lemma 8 [20]: Under the conditions of Lemma 7, suppose that C is full rank, then 


lm A*% =A, (64) 


OQ"! 0,R0,u>0 


Proof: It is straight forward to show that lim L,,C = I and therefore 


QO >0,R>0,u>00 


lim X= x, which implies (64) since the MLE (53) is unbiased. 


O'-0,R30,u0 


An illustration is presented below. 


“The Internet is the first thing that humanity has built that humanity doesn't understand, the largest 
experiment in anarchy that we have ever had.” Eric Schmidt, Google 
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Fig. 5. Sequence of A versus iteration number for Example 12. 


Example 12. \n respect of the model (23) — (25), suppose that B= C = 1, o% = 


0.2 are known and A = 0.6 is unknown. Simulations were conducted with 100 
realisations of Gaussian process noise and measurement noise of length N = 


500,000 for R =0.1, 0.01 and 0.001. The EM algorithms were initialised with A” 
= 0.9999. It was observed that the resulting estimate sequences were all 
monotonically decreasing, however, this becomes imperceptible at R = 0.001, due 
to the limited resolution of the plot. The mean estimates are shown in Fig. 5. As 


expected from Lemma 8, Aw asymptotically approaches the true value of A = 0.6 


when the measurement noise becomes negligible. 


8.4 Smoothing EM Algorithms 
8.4.1 Process Noise Variance Estimation 


8.4.1.1 EM Algorithm 


In the previous EM algorithms, the expectation step involved calculating filtered 
estimates. Similar EM procedures are outlined here where smoothed estimates are 
used at iteration u within the expectation step. The likelihood functions described 
in Sections 8.2.2 and 8.2.3 are exact, provided that the underlying assumptions are 
correct and actual random variables are available. Under these conditions, the 
ensuing parameter estimates maximise the likelihood functions and their limit of 
precision is specified by the associated CRLBs. However, the use of filtered or 
smoothed quantities leads to approximate likelihood functions, MLEs and CRLBs. 
It turns out that the approximate MLEs approach the true parameter values under 
prescribed SNR conditions. It will be shown that the use of smoothed (as opposed 


“The web as I envisaged it, we have not seen as yet. The future is so much bigger than the past.” Tim 
Berners-Lee 
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to filtered) quantities results in smaller approximate CRLBs, which suggests 
improved parameter estimation accuracy. 


Suppose that the system GY having the realisation (23) — (24) is non-minimum 
phase and D is of full rank. Under these conditions 7 ~ exists and the minimum- 
variance smoother (described in Chapter 7) may be employed to produce input 


estimates. Assume that an estimate OM = = diag( (6 ae ae (os) saad (Gy ) 
“ (u) 


of Q is are available at iteration vu. The smoothed input estimates, w,,,, are 
calculated from 


(u) (u) (u) (u) 
Neste | A, - K.-C, K, Xe ik-1 (65) 
a | [-EPYPC, EPY" IL a J 
Siac ey O. Ieee yee 
ae = Cc Key vie -C; (OM) ye ; (66) 


ce) 


4 7 (uw) 
Hn] OPDEK PY Oral OPDreayy” Lar 


u u T A(u T u)y\-l u u T 
where Ky ‘= (APG = BQ, D) Qy 2 ae oF = CRAG 7 
D OD + Rand P\? , evolves from the Riccati difference equation P),, = 
u u T u u T u 
A, A Ay 7 (AP AG + B Or Di Ce iC, + D 0 Di 


k/k-1 


R,)'(C, PAL + D,OWBT) + BOW BT . A smoothing EM algorithm for 


klk- 


iteratively re-estimating oO” is described below. 


Procedure 4. Suppose that an initial estimate of ) = diag( CAE (ys : 


(an; y ) is available. Then subsequent estimates Oo”, u > 1, are calculated by 
ae the following two ae 
Step 1. Use O" = diag( (6), (6%), ..., (E%)’)) within (65) — (66) to 


calculate smoothed er estimates 90. 


Step 2. Calculate the elements of ow) = = diag( (G0), (ayy tle 


gue) 


G""))") using vi"), from Step 1 instead of w, within the MLE 
— (46). 


“The Internet is not a thing, a place, a single technology, or a mode of governance. It is an agreement.” 
John Gage, Sun Microsystems 
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8.4.1.2 Properties 


In the following it is shown that the variance estimates arising from the above 
procedure result in monotonic error covariances. The additional term within the 
design Riccati difference equation (57) that accounts for the presence of parameter 


error is now given by S\"? = BO —Q)B". Let A“ denote an approximate 


spectral factor arising in the design of a smoother using Pl), and K('”. 


Employing the notation and approach of Chapter 7, it is straightforward to show 
that 


AA =AA* + CGH Cara = ve + SYGIC,. (67) 
Define the stacked vectors v = [v, .... ve ]), w= [W, .., Wy), WO = 


(Wey WOT and W =w-— Ww = (WY), ..., (W,)'T . The 


; ‘ é : 3 v 
input estimation error is generated by w) = 7 | where 7" (72 )" = 
w 


RE REY)" + FREY" in which 


Rip = 0G" (AM (Amy) (aay )A, (68) 


and 72) (72)" = 0G," -0OG" (AA")' GQ. It is shown in the lemma below 


eil 
that the sequence [meo~my l, = [Rey L, is monotonically non- 


increasing or monotonically non-decreasing, depending on the initial conditions. 


Lemma 9: In respect of Procedure 4 for estimating O, suppose the following: 
(i) the system (23) — (24) is non-minimum phase, in which A, B, C, D, R are 
known, |a,(A”) | <J/,i=1., ..., n, the pair (A, C) is observable and D is of 
full rank; 
(ii) the solutions P\), P‘ of (57) for 0” > O” satisfy P’ < PB") (or the 
solutions P{), Pi) of (57) for O" = QO” satisfy Pi) < Pi). 

Then: 
G) Pte S Pies (or Pitts S Pia) for all k, u2 1; 


ce (u+1) (u) (u+1) (u) (u) (u+1) (u) 
(i) Pate S Poin ond Pa S Pia (or Poin S Fain and Pi S 


Po) jor allk, u>1; 


“The spread of computers and the internet will put jobs in two categories. People who tell computers 
what to do, and people who are told by computers what to do.” Marc Andreessen 
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(iii) as ay" 


Proof: (i) and (ii) This follows from S“*) < S within condition (iii) of Theorem 2 
of Chapter 8. Since F(R)" is common to FR (R)" and 


ei] ei1 


REY FEY)", it suffices to show that 


Substituting (67) into (68) yields 


< [ROR )"|, er | 


( (u)\ 4 
RORY], s 


, 


RE (REO) for u > 1. 


ei2 ei2 


Ress Re" |, < Re ee)", (69) 
RS) =0G" (AA +CG, (PO, — PO, + SOGICTY (AAJA. (70) 
Note for linear time-invariant systems X, Y; => Y2, that 


(XCF (XK YY 2 OY XY, (71) 


Since |Q (Pict Pie +S YG", S| Pa Pg SOG" 
follows from (70) and (71). 


» 69) 


As is the case for the filtering EM algorithm, the process noise variance estimates 
asymptotically approach the exact values when the SNR is sufficiently high. 


Lemma 10: Under the conditions of Lemma 9, 


lim ow =0Q. (72) 


O!-0,R30,u>0 


Proof: By inspection of the input estimator, H,, = OG"(AA")' = 
OG" (GOG" + R)", it follows that _ lim H,, = G'' and therefore 


QO >0,R>0,u>00 
lim Ww), = we which implies (72), since the MLE (46) is unbiased for 
Q'0,R30,u 0 


large N. 


It is observed anecdotally that the variance estimates produced by the above 
smoothing EM algorithm are more accurate than those from the corresponding 
filtering procedure. This is consistent with the following comparison of 
approximate CRLBs. 


“What the country needs are a few labour-making inventions.” Arnold H. Glasow 
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Lemma 11: 


Blog s(ivlfav)) . (2 loa Soi |Sue)) ce 
(60;,,)° (653° | 


Proof: The vector state elements within (23) can be written in terms of smoothed 


state estimates, xX; ,,, 


AXtin + Wig = AX + Wey — AXgy, where Xpjy = Xk- 
X,;y. From the approach of Example 8, the second partial derivative of the 


corresponding approximate log-likelihood function with respect to the process 
noise variance is 


o log f(o;,, | Xeix) 7 
(00;,,)° 


i,w 


N phen as i 
x 5 (o; LAER Gh tA) fa 


Similarly, the use of filtered state estimates leads to 
oO log Lian | Kei) F. 
(66;;,,)° 


i,w 


iw 


N eae _ 
= 5 (o; + AE Ky Xin }4") a 


The minimum-variance smoother minimises both the causal part and the non- 
causal part of the estimation error, whereas the Kalman filter only minimises the 


causal part. Therefore, E{%,,yXjy} < E{%,,,%;,,}. Thus, the claim (73) follows. 


8.4.2 State Matrix Estimation 


8.4.2.2 EM Algorithm 


Smoothed state estimates are obtained from the smoothed inputs via 
au) 4 lu) » (u) 
Xean = A,Xziy + ByWyiy - (74) 


The resulting %{"), are used below to iteratively re-estimate state matrix elements. 


Procedure 5. Assume that there exists an initial estimate 4“ of A such that 
|2,(A)| <1, i=1, ..., 2. Subsequent estimates, A"), uw > 1, are calculated using 


the following two-step EM algorithm. 
Step 1. Operate the minimum-variance smoother recursions (65), (66), (74) 


designed with A“ to obtain £“),. 


“Big data will replace the need for 80% of all doctors.” Vinod Khosla 
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Step 2. Copy A” to A“? Use £. instead of x, within (53) to obtain 
candidate estimates 4\""), i, j= 1, ..., 2. Include 4{“*” within Aw) if 


(ACS) | AS Tn 


8.4.2.3 Properties 
Denote x = [x7 ..., x7], 8 = [8@)", .., G@,)'F and 8 =x 3 = 
(Ee), ., (EO). Let 72“ be redefined as the system that maps the inputs 


Vv 


. It is stated 
Ww 


v 4 ‘ ~(u : ~(u u 
to smoother state estimation error x”, that is, x“ = 2‘ | 
Ww 


below that the estimated state matrix iterates result in a monotonic sequence of 
state error covariances. 


Lemma 12: In respect of Procedure 5 for estimating A and x, suppose the 

following: 
(i) the system (23) — (24) is non-minimum phase, in which B, C, D, QO, R are 
known, | A (AM) | <1, the pair (A, C) is observable and D is of full rank; 
(ii) there exist solutions PB"), P‘ of (57) for AA’ < < satisfying < (or the 
solutions , of (31) for < <AA' satisfying <). 

Then < (or <) foru21. 


The proof is omitted since it follows mutatis mutandis from that of Lemma 9. 
Suppose that the smoother (65), (66) designed with the estimates is employed to 
calculate input estimates . An approximate log-likelihood function for the 
unknown given samples of is 


iw 


A (u N N u 1 u))\—- ~ A (u A(u 
log f(a, | 2h) => logda —Slosasn)* — (2) De Wah ah) 3) 
k=l 


v ‘ : : 
Now let # denote the map from to the smoother input estimation error 
w 


ww”? =w— Ww at iteration u. It is argued below that the sequence of state matrix 


iterates maximises (75). 


Lemma 13: Under the conditions of Lemma 12, 


foru=1. 


Fe ges? < jerry" |, 


“Things like chatbots, machine learning tools, natural language processing, or sentiment analysis are 
applications of artificial intelligence that may one day profoundly change how we think about and 
transact in travel and local experiences.” Gillian Tans, CEO of Booking.com 
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The proof follows mutatis mutandis from that of Lemma 9. The above Lemma 
implies 


ELmerd (perry < ELM (Wy . (76) 


It follows from Ww? =w- Ww that EW)" = Elw + ww + WY) = 
E(w) + O, which together with (76) implies EW?) < 
EW" (w)"} and log f(a, |Win) 2 log f(a, | WY.) for all w= 1. Therefore, it 


iyo ik/IN i,kIK 

is expected that the sequence of state matrix estimates will similarly vary 
monotonically. Next, it is stated that the state matrix estimates asymptotically 
approach the exact values when the SNR is sufficiently high. 


Lemma 14: Under the conditions of Lemma 9, 


lim AM=4. (77) 


Q'-0,R>0,u>0 


Proof: From the proof of Lemma 10, lim  wW?. = wz therefore, the states 
QO 1 


>0,R>30,u90 


within (74) are reconstructed exactly. Thus, the claim (77) follows since the MLE 
(53) is unbiased. 


It is expected that the above EM smoothing algorithm offers improved state matrix 
estimation accuracy. 


Lemma 15: 


[ete aa) 3 (2 log f(a, Hu) | 


(Ga, ,)° (da, ,)° (78) 
Proof: Using smoothed states within (51) yields x... = dia, % ain + Wa = 
j=l 
Ma Xia + We — UG Sey» where Xiy = Xk — &)y- The second partial 
j= Fl 


derivative of the corresponding log-likelihood function with respect to aj,j is 


o log f(4, |X) a 
(@a, ,)° 


N Sh S aX 
= Ga + AEG nkunt ) = ie : 


k=l 


Similarly, the use of filtered state estimates leads to 


“The best way to prepare is to write programs, and to study great programs that other people have 
written. In my case, I went to the garbage cans at the Computer Science Center and I fished out listings 
of their operating system.” William Henry (Bill) Gates III 
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o log (4; ; |Xein) No ~ 
z “_=——(o°, + AE{x ane yp ae ee 
(éa,,,)° sc iw Hi { kk kk i ) — dk 


The result (78) follows since EX, yX,y} < El%i4X,,}- 


s.__ filter & smoother, R=.0003 


Ss 


0.915; filter & smoother, R=.0001 


0 1 2 3 4 
uU 


Fig. 6. State matrix estimates calculated by the smoother EM algorithm and filter EM algorithm for 
Example 13. It can be seen that the A” bdetter approach the nominal A at higher SNR. 


Example 13.: Consider a system where B = C= D = O=1, R= {0.0001, 0.0002, 
0.0003} are known and 4 = 0.9 but is unknown. Simulations were conducted 
using 30 noise realisations with NV = 500,000. The results of the above smoothing 
EM algorithm and the filtering EM algorithms, initialised with A = 1.034, are 


respectively shown by the dotted and dashed lines within Fig. 6. The figure shows 
that the estimates improve with increasing u, which is consistent with Lemma 15. 
The estimates also improve with increasing SNR which illustrates Lemmas 8 and 
14. It is observed anecdotally that the smoother EM algorithm outperforms the 
filter EM algorithm for estimation of A at high signal-to-noise-ratios. 


8.4.3 Measurement Noise Variance Estimation 


The discussion of an EM procedure for measurement noise variance estimation is 
presented in a summary form because it follows analogously to the algorithms 
described previously. 


Procedure 6. Assume that an initial estimate R® of R is available. Subsequent 
estimates R”, u > 1, are calculated by repeating the following two-step 
procedure. 


“Don't worry about people stealing your ideas. If your ideas are any good, you'll have to ram them 
down people's throats.” Howard Hathaway Aiken 
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Step 1. Operate the minimum-variance smoother (7.66), (7.68), (7.69) designed 
with R“ to obtain corrected output estimates $("), . 
Step2. Fori= 1, ..., p, use p, instead of yx within (27) to obtain R“*) = 


diag( (67), (GzP) ss GRP)). 


It can be shown using the approach of Lemma 9 that the sequence of measurement 
noise variance estimates are either monotonically non-increasing or non- 
decreasing depending on the initial conditions. When the SNR is sufficiently low, 
the measurement noise variance estimates convergeto the actual value. 


Lemma 16: In respect of Procedure 6, 


lim R”=R. (79) 


R!0,0-0,u—>00 


Proof: By inspection of the output, #,, = GOG"(GOG" + Ry", it follows that 
lim = #,, = 0, which together with the observation | lim E{zz'} =R 


R!'50,030,u>0 R'30,0-0,u0 


implies (79), since the MLE (27) is unbiased for large N. 


Once again, the variance estimates produced by the above procedure are expected 
to be more accurate than those relying on filtered estimates. 


Lemma 17: 


(= log f(a, | ju) < (? log f(o, | Sul) : (80) 


(80;,,)° (80;,,)° 


Proof: The second partial derivative of the corresponding log-likelihood function 
with respect to the process noise variance is 


oe log f(o, | View) _ 
(d0°,)° 


N a, z 
~ 2 (o;, + EGG six Tiniz)) 2 


where jf), =y— 3). Similarly, the use of filtered state estimates leads to 


oO log f(G;, | Dine) 2 
(d0;,,)° 


N sk es = 
= 5) (o;, + EG Vind) Ay 


where Vee ah cae Vie: The claim (80) follows since Ent < EQ vind 


“We have always been shameless about stealing great ideas.” Steven Paul Jobs 
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8.5 Unbiased Estimation of State Space Parameters 


8.5.1 Overview 


Although the previously-discussed EM algorithms can yield improved state matrix 
and input covariance estimates, it has been found that they are only accurate when 
the measurement noise is negligible — see Lemma 10 and 14. Similarly in 
subspace identification, least-squares estimation of unknown state-space matrices 
can lead to biased results when the states are corrupted by noise (see [29] — [31] 
and the references therein). Techniques have been previously reported for the 
elimination of bias within autoregressive moving average (ARMA) models, 
including noise estimation [29] - [31], compensation [32], prefiltering [33] and 
iterative methods [31]. 


This section attends to unbiased estimation of state-space parameters. In common 
with ARMA model parameter estimation [29] - [31], a correction term is 
introduced to eliminate bias error. Unbiased, consistent, closed-form estimates of 
a state matrix, an input covariance and a measurement noise covariance are 
developed. It is established under simplifying conditions that they are equal to 
MLEs and attain the corresponding CRLBs. A search procedure is described for 
problems in which both the state matrix and the measurement noise covariance are 
unknown. The remainder of this section is organised as follows. Section 8.5.2 
defines the problem of interest. The estimation of state-space parameters from 
noisy observations is described in Section 8.5.3. An example that demonstrates 
filtering performance benefits is presented in Section 8.5.4. 


8.5.2. Problem Definition 


As before, let wz € R” denote an uncorrelated white sequence. Consider again a 
time-invariant state-space system having a realisation of the form 


Xp = AX, + Wy (81) 


2: 
VY, = Cx, G2) 


where x; € R” is the internal state, y%. ¢ R” is the output, 4, e R"” and C € 
R”™” . Observations of (82) are modelled as 


Zp =VetV » (83) 


where vy; € R” is an independent, white. measurement noise sequence. 


“The power of an idea can be measured by the degree of resistance it attracts.” David Yoho 
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It is convenient to write (81) - (83) in block form over integer k € [1, N] as 


Xiu = AX, +, (84) 
Y, =CX, (85) 
Z,=Y,+V,, (86) 
Mp XM a My 0 Xi y Wi oo Ww 
where Xi41= | 2 ¢. : |,M= ] i oo. ot |, Ke 
Xn2 Xn N41 Xn XnN Wii, Ww 
Vit Ji Via VN 
e R™ ; Y, = ‘ ; Vi = and Lk = 
| Yin 1 Yn, Vin 1 Vin N 
214 ZN 
: : e'R™*.. 
Zn, Zm,N 
F 0 if j#k 
Recall that (.)° and 6,, = lif jok denote the transpose and the Kronecker 
Uy Jz 


delta function, respectively. It is convenient to make some assumptions below in 
which 4,(A) denotes the i" eigenvalue of A and E{.} is the expectation operator. 


Assumption 1: (a) | 4,(A)| <1, i= 1, ..., 2. (b) The system (1) — (2) is reachable. 
(c) The pair (A, C) is observable. (d) E{w,} = E{v,} =0.(e) Efwwi} = 06; ,- 
(f) Ely }= RO), (g) E{wy,} = 0. 


Assumption I(a) ensures that the system (1) — (2) is asymptotically stable. 
Assuming reachability, 1(b), means that all modes of the system will be excited. 
Observability is assumed, l(c), so that the states can be recovered from the 
system’s output. The zero-mean 1(d) and uncorrelated noise assumptions 1(e) — 
(g) allow the standard form of the Kalman filter to be employed for calculating 
minimum mean-square error (MSE) state estimates from measurements (3). Note 


from (1) and Assumption I(e) that E{x,,,w/,,} =0,ie., x, is uncorrelated with 


W, - 


This section addresses the problem of estimating unbiased A, QO and R parameters 
from noisy measurement sequences. This is motivated by a desire to construct 


“If [had asked people what they wanted, they would have said faster horses.” Henry Ford 
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optimal predictors, filters and smoothers that return better MSE performance than 
those designed with conventional least-squares parameter estimates. 


8.5.3 Identification of Model Parameters 


8.5.3.1 Prerequisites 


The optimal filter gain minimises the predicted and filtered error covariances, 
which relies on having precise knowledge of parameters A, C, Q and R. If these 
parameters are not known accurately and a different gain is calculated then the 
observed error covariances will be larger. That is, estimation performance 
degrades when the identified model parameters are inaccurate. 


Use will be made of the Moore-Penrose pseudo inverse of C, which is denoted by 
ct =(C'C)'C’ e€ R”” . This is a generalisation of the matrix inverse and can 


be calculated from the Singular Value Decomposition. It is commonly used to find 
a least-squares solution to an overdetermined system of linear equations [34]. An 
additional assumption is stated which enables x, to be estimated directly from z; 


using the pseudo inverse. 


Assumption 2: C is full column rank so that C’ exists and C'C =J . Under the 
assumed condition, state estimates can be calculated as 


t=C'z, =e +O, (87) 


From (87) and Assumption 1(d), note that E{x—x} = 0,i.e., the x, are unbiased. 


Xa a Riv Ii .* Riggs 
By defining X¥, = |: 9. : |, X= | io. : e R””, (87) 
ae Lae ae Bead 
may be written in block form 
X,=C'Z, =X,+C'DV,. (88) 


In many applications R can be estimated from the observations during periods 
when yx is available. Often the C is known or selected by the filter designers, 
whereas A and Q typically require to be identified. Note that existing subspace 
identification techniques [12] — [14], [28] may be used to identify an unknown C. 
It is shown below that if R is known, the A and Q can be identified from the state 


“You do not really understand something unless you can explain it to your grandmother.” Albert 
Einstein 
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estimates (87) — (88). Conversely, if A is known then R can be identified. A search 
procedure is advocated for cases where both A and R are unknown. 


8.5.3.2 Identification of A with R Known 


An unbiased estimate 4 = {4, . of A = {a,, ‘| can be found by either solving or 


searching for the minima of cost functions, depending on whether R is known. 


Let 
2, OF] rem eee oe ed ie R** 


and 


denote the i" rows of X,,, and A, respectively. 


k+l 
Lemma 18: Suppose R is known. The estimate 
A= FR, 3 (ERR I- CRC) ey) 


minimises the cost functions 


(90) 


i , 


. i K A 2 
J (A, CLR, XX c41) = [rin -AX, 


i= 1, ...,n, where IL, denotes the 2-norm. 


Proof: Equations (86) and (88) imply XX} = X,X1 + X,VEC'" + CW,XT + 
C'RC" , which together with Assumptions I(f) — (g) give 


XT = XX ECR, (91) 


Let V, denote the gradient with respect to A. Setting VJ; (A,,C,R, Mig Ang) 


= -2(X,,,,-4X,)Xf =0,1= 1, ..., n, leads to 


i 


Ae eax. (92) 


i,k+1 


Using (91) within (92) results in 


“From the time I was seven, when I purchased my first calculator, I was fascinated by the idea of a 
machine that could compute things.” Michael Dell 
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A= EUR XH EUX, XP -CIRCM)". 3) 


Stacking the rows (93) of A yields (89). 


Thus, A is calculated from (89) in which %, and %,,, are found using the pseudo 


inverse in (87). A method for estimating an unknown R within (89) is described in 
Section IIE. 


\Lemma 19: Under Assumptions I - 2, the estimate (89) is unbiased. 
Proof: (a) Equations (81), (87) yield E{%,8,} = E{x,xp} + Efx,v[}C™ + 
C'E{v.x(} + C'RC™, which together with Assumptions 1 (f) — (g) give 
E{&, 37} = Efx,x7}+CtRC™. (94) 
Right-multiplying (1) by xx and taking expectations results in E{x,,,x;} 
AE{x,x,} + E{x,w(}B" , from which the actual state matrix is specified by 
A= E(x,..%, SEO MS - (95) 


since E{x,w,} = 0. It follows from (81), (87) that E{%,,,%,} = AE{x,x/} + 
AE{x,v}C™ + Efwogy + Efmy3ch + CEM. IC™ + CTEM x} 
and using Assumptions I (f) — (g) leads to 

EX%, .k,} = AE, x, } + E(w, x; )- (96) 
Substituting (94) - (96) into (89) gives 

A =A + Efwxlye{e ary. (97) 
As explained in Section 8.5.2, x~ and wx are uncorrelated, so (97) implies 


E{ A=. 


The estimate (89) includes the correction term C'RC" to accommodate noisy 
states. This generalises the least-squares estimate of A that is used within EM 
algorithms such as [20]. It is shown below that the estimate converges as the 
number of data points increases. 


Lemma 20: The estimate (89) is asymptotically consistent. 


“Tt is unworthy of excellent men to lose hours like slaves in the labour of calculation which could be 
relegated to anyone else if machines were used.” Gottfried Wilhelm von Leibnitz 
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Proof: From (97) and Assumptions I (f) — (g), it is found that 
E{(A- A A- A)"} = Elway JE) EW} 
= NS wx; E(x, x, 3° ND x we , 
k=1 k=1 


since xx and w, are uncorrelated, lim E{(A- AXA - A)"} =0. 


The approach of [2] is used to show that (89) attains the corresponding CRLB. It 
follows from (87) that %,,, = x,,, + C'v,,, = Ax, + u,, where u, = w, + 
C'y,,, and Efuu} = OQ + CtRC™. Suppose that %,, ~ 
N (Ax,,0+C'RC"") , i.e., the probability density function of %,,, is 


1 
On) |\OFC RE |" 


luo ,. a tn 
expt > Di (Xp. — Ax.) (O+ C'RC"’) “Cee: 


P(X 41) - 


Setting V, log(p(%,,,)) = 0 leads to an MLE equal to (89). Straightforward 


algebraic manipulations result in 


V; log(p(%.1)) = (AAA), (98) 
where 


I(A) = -V7, log(p(,..)) =(Q+ CTRC™) "D1 —CTRC™ 
k=l 
is the corresponding Fisher information matrix, in which V7 denotes the Hessian 


with respect to A. Since the estimate (89) satisfies (98), it attains the CRLB and 
is an efficient estimator [2]. 


8.5.3.3 Identification of Q with R Known 


Unbiased estimates of Q are similarly determined below. Two cases are 
considered, depending on whether the actual or estimated A is available. 


Lemma 21: Under Assumptions I - 2 with xo = 0: 


(a) 


“Give a man a fish, and he will eat for a day. Give a man Twitter, and he will forget to eat and starve to 
death.” Andy Borowitz 
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P= lim E{x,x;}. (99) 


or equivalently 
P= lim rOn, 


(100) 
satisfy 
BOB’ = P-APA', (101) 
where T, = [7 Av ria is the reachability matrix. 
(b) For cases where the actual A is available, the estimate 
O = E{&,37}-—C'RC™ — A(E{%, 87} -CtRC™)AT (102) 
of Q is unbiased. 
(c) For cases where the estimate A of A is available, the estimate 
O = E{%,87}-—CtRC™ — A(E{%, 87} —CtRC')A™ (103) 


of OQ is unbiased. 


Proof: (a) Right-multiplying (81) by x; taking expectations and using (99) leads 
to (101). Equivalently, in a subspace identification framework [28], [14], with xo 
Wr 


= 0, note that x,=T,| |. Denote the k-step controllability gramian by Px = 
Ww 
Wo 


T,QV%. The (steady state) controllability gramian is given by (20), which satisfies 


the Lyapunov equation (101). 
(b) Substituting (14) in (22) results in 


O = O+ E{x,vi}c" +CTE{y,x,} -A(E{x,v}C™ +ClE{y, xf })A’. (104) 


Since x; and vx; are uncorrelated (from Assumption 1(g)), (24) implies E{Q} =Q. 
(c) Using (89) and (94) within (102) results in (103) from which the claim follows. 


Thus, Q can be estimated by factorising a lower triangular matrix of the form 
(100) or more simply from the Lyapunov equation (101). It is noted below that 
(102) and (103) converge as the number of data points increases. 


“If it keeps up, man will atrophy all his limbs but the push-button finger.” Frank Lloyd Wright 
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Lemma 22: The estimates (102) and (103) are asymptotically consistent. 

Proof: Since 

E((Q-O\O-0)"} 

7 (E{x,v, }C% + CTE{y,x{} - A(E{x, vi 3C™" + CTE{y,x{ })4') 
x(E{x,v}C +ClE{y, x1 }- A(E{x, v3 C7 + ChE xy AY 


contains factors of the form We Car and NO eC: 
kal kl 
lim E{(Q-O\O-9)"} =0. 


It is similarly shown that (102) attains the corresponding CRLB. Suppose that 
K,,, ~ M(Ax,, BOB’ + CRC"), i.e., the probability density function of %,,, is 


1 
(27) | O+ C’RC™ ia 


P(X) - 


lune, Tien 
expt Des Fees — Ax,)"(Q+ ERC") ‘Og Ag,). 
By setting V oat log(p(x,,,)) = 0, it is easily confirmed that (102) is an MLE. 


Straight forward manipulations reveal that 


V?> yr log(p(¥,.1)) = JQO-O), (105) 


where 
I(BOB") =-V" 


BoB" 


log(p(%,.1)) = 0.5N(Q+C'R(C')'Y” 


is the corresponding Fisher information matrix. Since V log(p(%,,,)) is of the 


BOB" 


form (105), the estimate (102) attains the CRLB and is efficient [2]. 


8.5.3.4 Identification of R with A Known 


Suppose that A is known and its inverse exists. A rearrangement of (89) suggests 
that R can be identified as 


“If automobiles had followed the same development cycle as the computer, a Rolls-Royce would today 
cost $100, get a million miles per gallon, and explode once a year, killing everyone inside.” Mark 
Stephens 


244 Chapter 8 Parameter Estimation 


R= C(E{R, 51} — A EGR, SP Y)CT. (106) 


Lemma 23: Under Assumptions I - 2, the estimate (106) is (a) unbiased and (b) 
consistent. 


Proof: (a) Substituting (94) and (96) into (106) yields 
R=R-CA"E{w,x1})C". (107) 


Since x; is uncorrelated with w;, (107) implies E{R} =R. 
(b) It follows from 


E{(R-R)(R-R)"} = [coin S mal Jeow Swat ) ; 
k=1 k=1 
that lim E{(R- R\(R - Ry} = 0, since xx is uncorrelated with wt. 
In order to investigate the CRLB for R, suppose that £, ~ A/(x,,C'R(C')’), 
i.e., the probability density function of x, is 
D(%,) = On) as i exp { pee (x, =i5)- (C'RCT)"(8, —x,). 


Setting V; log(p(X,)) = 0 reveals that (106) is an MLE. It is easily verified that 
Vj, log(p(%)) = 1(RYR-R), ae) 


where /(R) =—V;, log(p(%,))=0.5NR~ is the corresponding Fisher information 
matrix. Since the MLE satisfies (28), it attains the CRLB and is efficient [2]. 


8.5.3.5 Identification of unknown A, Q and R 


The approaches of Sections 8.5.3.2 — 8.5.3.4 are now combined for the problem of 
estimating unknown A, QO, and R. One way of proceeding is to search for the 


minima of the cost functions (90) for all admissible A and R,ie., 


{AR} = arg min IARC REM X acids (109) 


pe be. 1 
A>0, |4,(A)|<1, R>0 


in which the relationship between X, and a , 18 described by (91). From 
Lemmas 17 and 23, the search problem (109) has a minimum 


“What lies at the heart of every living thing is not a fire, not warm breath, not a ‘spark of life’. It is 
information, words, instructions.” Clinton Richard Dawkins 
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J(A,C,R,X 5X p41) =0 at A =A and R= R. After A and R have been 


i 


identified from (109), the unknown O can be obtained (103). 


8.5.4 Example 


Model outputs (2) of length N = 1,000,000 were synthesised using 4 = i | 


Ay, Ay) 


0.6 0.1 1 0 1 0 . , ae 
= Os , a= and Gaussian noise realisations. 
0 0.7 0 1 0 2 


Measurements were then generated using independent Gaussian noise realisations 
within (3), and the signal-to-noise ratio (SNR) was varied in | dB steps, from -5 to 
5 dB. 


a4, Ap 


Unbiased estimates A = and R were obtained from the 


ay, Ay) 
measurements by searching for the minimum of (109). The search space included 
81 candidate A matrices, namely, a, = 1G,-01,G50, 401 ts = 412,300 11 
candidate R matrices, corresponding to the afore-mentioned SNRs. An unbiased 
estimate O was then obtained using the identified A and R within (103). 
Conventional (biased) estimates of A and Q were also calculated by neglecting R 


within (89) and (103), respectively. 


3 
BL 
WwW SR, 
g tox x a 
1 98S 
Pe 
S60 
0 
5 0 5 
SNR, dB 


Fig. 7. Mean square error exhibited by filters designed with: actual A, O and R (crossed 
line); unbiased estimates of A, Q and R (dotted line); and biased estimates of A, QO and 
unbiased estimate of R (dashed line). 


“In my lifetime, we've gone from Eisenhower to George W. Bush. We've gone from John F. Kennedy 
to Al Gore. If this is evolution, I believe that in twelve years, we'll be voting for plants.” Lewis Niles 
Black 
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An optimal Kalman filter was designed using the actual A, Q and R. The MSEs 
observed for the filter operating on the measurements are indicated by the crossed 
line of Fig. 7. A second filter was designed using the unbiased estimates of A, O 
and R, for which the resulting MSEs are indicated by the dotted line of Fig. 7. It 
can be seen that the filters designed with actual and unbiased parameter estimates 
exhibit indistinguishable performance, which illustrates the asymptotic 
consistency results of Lemmas 19, 22 and 23. 


A third filter was designed using the conventional (biased) least-squares estimates 
of A and Q, plus the unbiased estimate of R, for which the resulting MSEs are 
indicated by the dashed line of Fig. 1. It can be seen that this filter exhibits the 


worst performance. As expected, retaining the correction term, C'RC™ , within 
(89) and (103) returns a negligible benefit at high SNR. 


8.6 Chapter Summary 


Optimal predictor, filter and smoother designs require knowledge of model 
parameters. The least-squares solutions for estimating A are Q from the data are 
well known. From the Central Limit Theorem, the mean of a large sample of 
independent identically distributed random variables asymptotically approaches a 
normal distribution. Consequently, parameter estimates are often obtained by 
maximising Gaussian log-likelihood functions. Indeed, the MLEs of A are Q are 
the same as the least-squares estimates, which listed in Table 1. They are unbiased 
provided that the assumed models are correct, perfect (or noiseless) state 
information is available and the number of samples is sufficiently large. 


Usually, both states and parameters need to be estimated from noisy 
measurements. The EM algorithm is a common technique for solving joint state 
and parameter estimation problems. It has been shown that the estimation 
sequences vary monotonically and depend on the initial conditions. An 
examination of the approximate Cramér-Rao lower bounds shows that the use of 
smoothed states as opposed to filtered states is expected to provide improved 
parameter estimation accuracy. 


When the SNR is sufficiently high, the states can be recovered exactly and the 


bias errors diminish to zero, in which case lim Q@=Qand lim A =A. 
Q'0,R0 a-!0,R30 


Therefore, the process noise covariance and state matrix estimation procedures 
described herein are advocated when the measurement noise is negligible. 
Conversely, when the SNR is sufficiently low, that is, when the estimation 


problem is dominated by measurement noise, then lim R =R. This suggests 
0->0,R'!>0 


“Tt took less than an hour to make the atoms, a few hundred million years to make the stars and planets, 
but five billion years to make man!” George Gamow 
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that measurement noise covariance estimates are best calculated when the signal is 
absent. 


The afore-mentioned high-SNR asymptotes prompt the inclusion of a correction 
term for problems where imperfect or noisy state estimates are available. The 
resulting MLEs for A, Q and R are listed in Table 2, which are unbiased provided 
that the assumed models are correct and the number of samples is sufficiently 
large. It is established that these estimates are asymptotically consistent. It is also 
shown under prescribed conditions that they are MLEs which attain the 
corresponding CRLBs. 


ASSUMPTIONS MAIN RESULTS 
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Table 1. Estimates of process noise covariance, state matrix and measurement noise variance, assuming 
that the actual states are available. 
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Table 2. Estimates of process noise variance, state matrix element and measurement noise variance, 
assuming that estimated states are available. 


“The faithful duplication and repair exhibited by the double-stranded DNA structure would seem to be 
incompatible with the process of evolution. Thus, evolution has been explained by the occurrence of 
errors during DNA replication and repair.” Tomoyuki Shibata 
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In the event that A, O and R are all unknown, a search may be conducted for the 
minimum of cost functions for all admissible A and R (as described above). The 
MLE for Q may be calculated subsequently. It is demonstrated with the aid of a 
modelling study that a filter designed with the resulting unbiased parameters can 
outperform a filter designed with conventional parameter estimates. 


In summary, for cases where perfect or noiseless states are available, the standard 
least squares or MLEs (shown in Table 1) should be satisfactory. If the states are 
accessible and moderate measurement noise is present, the unbiased MLEs 
(shown in Table 2) can be used. Otherwise designers may have to consider using 
EM algorithms for joint state and parameter estimation. 


8.7 Problems 


Problem 1. 
(i) Consider the second order difference equation xp+2 + aixe+1 + Goxe = We. 


Assuming that w, ~ A/(0, o2), obtain an equation for the MLEs of the 


unknown aj and ao. 
(ii) Consider the n' order autoregressive system Xpin + Qn-iXken-l + Gn-2Xkeen-2 + 
... + AoxXk = We, Where Gn-1, Gn-2, ..., ao are unknown. From the assumption wz ~ 


MN (0, o), obtain an equation for MLEs of the unknown coefficients. 


Problem 2. Suppose that NV samples of xi+1 = Axx + we are available, where wz ~ 
N (0, o), in which o% is an unknown parameter. 
(i) Write down a Gaussian log-likelihood function for the unknown 
parameter, given x,. 
(ii) Derive a formula for the MLE G2 of o%. 


(iii) Show that E{G°} = o%, provided that N is large. 
(iv) Find the Cramér Rao lower bound for G°. 


(v) Replace the actual states x, with filtered state x,,, within the MLE 
formula. Obtain a high SNR asymptote for this approximate MLE. 


Problem 3. Consider the state evolution x,,, = Ax,+w,, where 4 € R”” is 


unknown and w;.€ R”. 
(3) Write down a Gaussian log-likelihood function for the unknown 
components aj; of A, given x, and x4+1. 
(ii) Derive a formula for the MLE of aj;. 
(iii) Show that = aj; Replace the actual states x, with the filtered state within 
the obtained formula to yield an approximate MLE for ajj. 


“Tt took less than an hour to make the atoms, a few hundred million years to make the stars and planets, 
but five billion years to make man!” George Gamow 


G. A. Einicke, Smoothing, Filtering and Prediction: Estimating 249 
the Past, Present and Future (24 ed.), Prime Publishing, 2019 


(iv) Obtain a high SNR asymptote for the approximate MLE. 


Problem 4. Consider measurements of a sinusoidal signal modelled by yx = 
Acos(2mfk + @) + v%, with amplitude A > 0, frequency 0 < f< 0.5, phase @ and 
Gaussian white measurement noise vx. 
(i) Assuming that @ and f are known, determine the Fisher information and 
the Cramer Rao lower bound for an unknown A. 
(ii) Assuming that A and @ are known, determine the fisher information and 
the Cramér Rao lower bound for an unknown /o. 
(iii) Assuming that A and fare known, determine the Fisher information and 
the Cramer Rao lower bound . 
(iv) Assuming that the vector parameter [A™, , p']’ is known, determine the 
Fisher information matrix and the Cramér Rao lower bound. (Hint: use small 
angle approximations for sine and cosine, see [2].) 


8.8 Glossary 


SNR 
MLE 
CRLB 
FO) 


Xe~ ) 


Wik 5 Vik 5 Zi,k 
° 


29 


Aj, C 
Kirk, Lik 
aij 


Signal to noise ratio. 

Maximum likelihood estimate. 

Cramér Rao Lower Bound 

The Fisher information of a parameter 0. 

The random variable x; is normally distributed with mean u 
and variance . 

i elements of vectors wx , Vk, Zk. 

Estimates of variances of wi, and v;, at iteration wu. 
Estimates of state matrix A, covariances R and Q at iteration 
u. 
The i eigenvalues of . 

i row of state-space matrices A and C. 

i* row of predictor and filter gain matrices K; and Ly. 
Additive term within the design Riccati difference equation 
to account for the presence of modelling error at time & and 
iteration w. 

Element in row i and column j of A. 

Moore-Penrose pseudo-inverse of Cy. 

A system (or map) that operates on the estimation problem 
inputs i and produces the output estimation error at 
iteration uw. It is convenient to make use of the factorisation 
= +,where depends the filter or smoother solution and is 
a lower performance bound. 


“We don’t know all the answers. If we knew all the answers we'd get bored, wouldn’t we? We keep 
looking, searching, trying to get more knowledge.” Jack LaLanne 
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9. Robust Prediction, Filtering and 
Smoothing 


9.1 Introduction 


The previously-discussed optimum predictor, filter and smoother solutions assume 
that the model parameters are correct, the noise processes are white and their 
associated covariances are known precisely. These solutions are optimal in a 
mean-square-error sense, that is they provide the best average performance. If the 
above assumptions are correct, then the filter’s mean-square-error equals the trace 
of design error covariance. The underlying modelling and noise assumptions are a 
often convenient fiction. They do, however, serve to allow estimated performance 
to be weighed against implementation complexity. 


In general, robustness means “the persistence of a system’s characteristic 
behaviour under perturbations or conditions of uncertainty” [1]. In an estimation 
context, robust solutions refer to those that accommodate uncertainties in problem 
specifications. They are also known as worst-case or peak error designs. The 
standard predictor, filter and smoother structures are retained but a larger design 
error covariance is used to account for the presence of modelling error. 


Designs that cater for worst cases are likely to exhibit poor average performance. 
Suppose that a bridge designed for average loading conditions returns an 
acceptable cost benefit. Then a robust design that is focussed on accommodating 
infrequent peak loads is likely to provide worse cost performance. Similarly, a 
worst-case shoe design that accommodates rarely occurring large feet would 
provide poor fitting performance on average. That is, robust designs tend to be 
conservative. In practice, a trade-off may be desired between optimum and robust 
designs. 


The material canvassed herein is based on the H. filtering results from robust 
control. The robust control literature is vast, see [2] — [33] and the references 
therein. As suggested above, the H. solutions of interest here involve observers 
having gains that are obtained by solving Riccati equations. This Riccati equation 
solution approach relies on the Bounded Real Lemma — see the pioneering work 


“On a huge hill, Cragged, and steep, Truth stands, and he that will Reach her, about must, and about 
must go.” John Donne 
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by Vaidyanathan [2] and Petersen [3]. The Bounded Real Lemma is implicit with 
game theory [9] — [19]. Indeed, the continuous-time solutions presented in this 
section originate from the game theoretic approach of Doyle, Glover, 
Khargonekar, Francis Limebeer, Anderson, Khargonekar, Green, Theodore and 
Shaked, see [4], [13], [15], [21]. The discussed discrete-time versions stem from 
the results of Limebeer, Green, Walker, Yaesh, Shaked, Xie, de Souza and Wang, 
see [5], [11], [18], [19], [21]. In the parlance of game theory: “a statistician is 
trying to best estimate a linear combination of the states of a system that is driven 
by nature; nature is trying to cause the statistician’s estimate to be as erroneous as 
possible, while trying to minimise the energy it invests in driving the system” 
[19]. 


Pertinent state-space H. predictors, filters and smoothers are described in [4] — 
[19]. Some prediction, filtering and smoothing results are summarised in [13] and 
methods for accommodating model uncertainty are described in [14], [18], [19]. 
The aforementioned methods for handling model uncertainty can result in 
conservative designs (that depart far from optimality). This has prompted the use 
of linear matrix inequality solvers in [20], [23] to search for optimal solutions to 
model uncertainty problems. 


It is explained in [15], [19], [21] that a saddle-point strategy for the games leads to 
robust estimators, and the resulting robust smoothing, filtering and prediction 
solutions are summarised below. While the solution structures remain unchanged, 
designers need to tweak the scalar within the underlying Riccati equations. 


This chapter has two main parts. Section 9.2 describes robust continuous-time 
solutions and the discrete-time counterparts are presented in Section 9.3. The 
previously discussed techniques each rely on a trick. The optimum filters and 
smoothers arise by completing the square. In maximum-likelihood estimation, a 
function is differentiated with respect to an unknown parameter and then set to 
zero. The trick behind the described robust estimation techniques is the Bounded 
Real Lemma, which opens the discussions. 


9.2 Robust Continuous-time Estimation 


9.2.3. Continuous-Time Bounded Real Lemma 


First, consider the unforced system 


x(t) = A(t)x(t) (1) 


“Uncertainty is one of the defining features of Science. Absolute proof only exists in mathematics. In 
the real world, it is impossible to prove that theories are right in every circumstance; we can only prove 
that they are wrong. This provisionality can cause people to lose faith in the conclusions of science, but 
it shouldn’t. The recent history of science is not one of well-established theories being proven wrong. 
Rather, it is of theories being gradually refined.” New Scientist vol. 212 no. 2835 
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over a time interval t € [0, 7], where A(t) e R””. For notational convenience, 
define the stacked vector x = {x(f), t € [0, T]}. From Lyapunov stability theory 
[36], the system (1) is asymptotically stable if there exists a function Vix(4)) > 0 


such that V(x(t)) <0. A possible Lyapunov function is V(x(t)) = x" (t)P(t)x(t), 
where P(t) = P‘(t) € R”” is positive definite. To ensure x € Lo it is required to 
establish that 


V (x(t) = x" (t)P(t)x(t) +x" ()P(t)x(t) +x" (P(OX(t) <0. (2) 


Now consider the output of a linear time varying system, y = G w, having the 
state-space representation 


X(t) = A()x(t) + BO) W(P) , (3) 
yt) = COX), (4) 


where w(t) € R”, BY) € R”” and Ci) € R’”. Assume temporarily that 
E{w(t)w’ (r)} = 16(t—7). The Bounded Real Lemma [13], [15], [21], states that 


w € Lo implies y € L2 if 


V(x(t))+y" Oy)—y?w' (w(t) <0 (5) 


foray € R. Integrating (5) from ¢ = 0 to t= T gives 


[[ Vow) ae+["y"@Oy@ at—7 fw" Owl) dt <0 (6) 


and noting that [ V (x(t) dt =x"(T)P(T)x(T) — x"(0)P(0)x(0), another objective is 


x (D)PM)x(T)~x" (POO) +f, yO at 


2 (7) 
7 ec. 
i w' (t)w(t) dt 


Under the assumptions x(0) = 0 and P(7T) = 0, the above inequality simplifies to 


bok, _ [Or at (8) 
JMOL, J. ww ae 


The -norm of & is defined as 


“Information is the resolution of uncertainty.” Claude Elwood Shannon 
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G 
yo, = Ph et. 6) 


PL hel, 


The Lebesgue o-space is the set of systems having finite «-norm and is denoted 
by La. That is, G € Lao, if there exists ay € R such that 


up ||Z||,, = sup Phe vs (10) 


I1.#0 bho |p, 


namely, the supremum (or maximum) ratio of the output and input 2-norms is 
finite. The conditions under which GY e CL. are specified below. The 


accompanying sufficiency proof combines the approaches of [15], [31]. A further 
five proofs for this important result appear in [21]. 


Lemma 1: The continuous-time Bounded Real Lemma [15], [13], [21]: In respect 
of the above system G , suppose that the Riccati differential equation 


—P(t) = P(t) A(t) + A" (1)P(Q)+C" (ONC(t) + P(t)B(t)B’ (t)P(t) , (11) 
has a solution on [0, T]. Then IF, < yforanyw € L. 
Proof: From (2) — (5), 
Vity OvO-rw' Owe) 
=x" (CT (NCOx()- 7 w" (Nw(t)+ x" OPOx() 
+(A(t)x(t) + BQ) w(0))" P()x(t) +x" (1)P()(AM x" (1) + BO w(t) 


= yx! ()P(QB(OB’ (NP()xO) — 7? w' Owl) 
+w' ()B" (OP(Ox(t) + x" QPOBOwO) 

=-7' (w(t) 7° BO)PO)x(0))" (WO) - 7° B)POx(0) , 
which implies (6) and (7). Inequality (8) is established under the assumptions x(0) 
= 0 and P(T) = 0. 
In general, where E{w(t)w'(r)} = QO(t)d(t-r), the scaled matrix B(t) = 
B(t)Q"*(t) may be used in place of B(t) above. When the plant G has a direct 
feedthrough matrix, that is, 


V1) = C(x) + Dw) , (12) 


“All exact science is dominated by the idea of approximation.” Earl Bertrand Arthur William 
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D(t) € R?*", the above Riccati differential equation is generalised to 


P(t) = P(QAO + BOM OD" OCH) HAD + BOM" OD" OCH) RO 
+7? BWM ()B" ()+C" (DU + DOM (OD O)C(H), (13) 


where M(t) = y*I— D1(t)D(0) > 0. A proof is requested in the problems. 


Criterion (8) indicates that the ratio of the system’s output and input energies is 
bounded above by y” for any w € £2, including worst-case w. Consequently, 
solutions satisfying (8) are often called worst-case designs. 


9.2.4 Continuous-Time H~ Filtering 
9.2.4.1 Problem Definition 
Now that the Bounded Real Lemma has been defined, the H. filter can be set out. 


The general filtering problem is depicted in Fig. 1. It is assumed that the system 
G, has the state-space realisation 


X(t) = A()x(t) + BQ) w(t), x(0) = 0, (14) 
Vi) =C,(Ox(0). (15) 


Suppose that the system G, has the realisation (14) and 


YMO=CG Oxo. (16) 


Fig. 1. The general filtering problem. The objective is to estimate the output of G. from noisy 


measurements of G, . 


“Although economists have studied the sensitivity of import and export volumes to changes in the 
exchange rate, there is still much uncertainty about just how much the dollar must change to bring 
about any given reduction in our trade deficit.” Martin Stuart Feldstein 
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It is desired to find a causal solution A that produces estimates j,(t|t) of yi(t) 
from the measurements, 


Z(t) =y,() + v(t), (17) 
at time ¢ so that the output estimation error, 
VEID=yVO-WClO, (18) 


is in £2. The error signal (18) is generated by a system denoted by » = #,i, 


where i = |] and #,= HH HG,-G,|. Hence, the objective is to achieve 
WwW 


[elo 5e|0) dt — a OLG) dt <0 for some y € R. For convenience, 
it is assumed here that w(f) € R”, E{w()} = 0, Efw(t)w’ (z)} = O(d(t-7), 
v(t) € R’, Ef} =0, Ev" (d)} = RDS(t—7) and Efw(t)v" (2)} =0. 


9.2.4.2 Hx Solution 


A parameterisation of all solutions for the H. filter is developed in [21]. A 
minimum-entropy filter arises when the contractive operator within [21] is zero 
and is given by 

X(t|t) =(AM-K(NC,()) H(t |N+KOz(0), 20) =0, 


5/0 =CE|Ox0), (19) 
(20) 
where 
K(t) = P(t)C; (R(t) 
(21) 


is the filter gain and P(t) = P7(t) > 0 is the solution of the Riccati differential 
equation 


P(t) = A(t)P(t) + P(t)A’ (t) + B(t)O(t)B’ (t) 
—P(t(C; (QR (NC, ()- °C (NC, (0) P(A) , P(0) = 0. (22) 


It can be seen that the H.. filter has a structure akin to the Kalman filter. A point of 
difference is that the solution to the above Riccati differential equation solution 
depends on C,(4), the linear combination of states being estimated. 


“Uncertainty and expectation are the joys of life. Security is an insipid thing, through the overtaking 
and possessing of a wish discovers the folly of the chase.” William Congreve 
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9.2.4.3 Properties 


Define A(t) = A(t) — K(t)C2(#). Subtracting (19) — (20) from (14) — (15) yields the 
error system 


. 7 e ‘ X(t | t) 
eee [-K() v(t) |, ¥(0) = 0, 
KEIN] [GO [0 oJ ee 


Ri, (23) 


where X(t|t) = x(t) — x(t|f) and 7, = 


yt 


ee CKO aH _ The adjoint of 
Ci) = [0 0] 
Fe, is given by 
sf) GG) 
Re, = K "(t) 0| |. It is shown below that the estimation error satisfies 
ie A 4 


the desired performance objective. 


Lemma 2: In respect of the H» problem (14) — (18), the solution (19) — (20) 
achieves the performance x'(T)P(T)x(T) - x"(0)P(0O)x(0) + 


[eeloxe|y at — Pf" OAD at <0. 


Proof: Following the approach in [15], [21], by applying Lemma I to the adjoint 
of (23), it is required that there exists a positive definite symmetric solution to 


—P(r) = A(t) P(r) + P(t) A’ (tr) + B(r)O(t)B" (7) + K (t)R(t)K" (r) 
+y°P(t)C/ (t)C,(z)) P(t), P(r) cag Os 


on [0, T] for some y € R, in which t = T — t is a time-to-go variable. Substituting 
K(r) = P(r)C; (r)R '(r) into the above Riccati differential equation yields 


—P(r) = A(t) P(r) + P(t) A’ (tr) + B(t)O(t)B’ (zr) 
—P(t)(Cy (t)R'(t(C, (7) - °C] (2) C,(z)) P(x), P(e) 


ar =O: 


Taking adjoints to address the problem (23) leads to (22), for which the existence 
of a positive define solution implies x'(T)P(T)x(T) —  x"(0)P(0)x(0) + 


“Entrepreneurs must devote a portion of their minds to constantly processing uncertainty. So you 
sacrifice a degree of being present.” Scott Belsky 
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[37 eo 5(e|0) dt — 7° [i iO) dt <0. Thus, under the assumption x(0) = 0, 
[Helo 5D de — PP OM dt < -x (TPCT) < 0. Therefore, 
Re, € Le, that is,w,v €L2=> y €L2 


9.2.4.4 Trading-Off H» Performance 


In a robust filter design it is desired to meet an H. performance objective for a 
minimum possible y. A minimum y can be found by conducting a search and 
checking for the existence of positive definite solutions to the Riccati differential 


equation (22). This search is tractable because P(t) is a convex function of ’, 
a P(t) 
oy 


since = 7 °P(t)C) (OC, O)P) > 0. 


In some applications it may be possible to estimate a priori values for y. Recall for 
output estimation problems that the error is generated by y= #,i, where 7, = 


+[H (A-I)G). From the arguments of Chapters 1 — 2 and [28], for single- 


= 1, which implies 


input-single-output plants lim \H| = 1 and lim |%,, 
«0 oo! © 


lim |7z,72 [. = 0°. Since the H.. filter achieves the performance |z.727 le a 


a 0 


y’, it follows that an a priori design estimate is y = o, at high signal-to-noise- 


ratios. 


When the problem is stationary (or time-invariant), the filter gain is precalculated 
as K = PC’R™', where P is the solution of the algebraic Riccati equation 


0 = AP+ PA’ — P(CTR'C, -y°C'C,)P + BOB’. (24) 


Suppose that G, = G is a time-invariant single-input-single-output system and 
let Reis) denote the transfer function of 7,,. Then Parseval’s Theorem states that 


the average total energy of e(t| tf) is 


jo 


|S@ IO =f ee lf at = “RnR (s) ds = aa? (s)[ ds (25) 


which equals the area under the error power spectral density, RRS (s). Recall 
that the optimal filter (in which y = 0) minimises (25), whereas the H. filter 


minimises 


“Life is uncertain. Eat dessert first.” Ernestine Ulmer 
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~i2 
R,,Re I. = sup al a ae (26) 


y f/2 
il, #0 il], +0 
lt. Ho if, 


In view of (25) and (26), it follows that the H. filter minimises the maximum 


magnitude of RRy (s). Consequently, it is also called a ‘minimax filter’. 


However, robust designs, which accommodate uncertain inputs tend to be 
conservative. Therefore, it is prudent to investigate using a larger y to achieve a 
trade-off between H.. and minimum-mean-square-error performance criteria. 


10° 
Frequency, Hz 


Fig. 2. IR Fi Ry (s)| versus frequency for Example 1: optimal filter (solid line) and H., filter (dotted 


line). 


Example 1. Consider a time-invariant output estimation problem where A = -1, B 
= =C,=1, o = 10 and o = 0.1. The magnitude of the error spectrum 
exhibited by the optimal filter (designed with y* = 10°) is indicated by the solid 


line of Fig. 2. From a search, a minimum of y* = 0.099 was found such that the 
algebraic Riccati equation (24) has a positive definite solution, which concurs with 


the a priori estimate of y? ~ 02 . The magnitude of the error spectrum exhibited by 
the H. filter is indicated by the dotted line of Fig. 2. The figure demonstrates that 
RRS (s)| < y’. Although the H.. filter reduces the peak of the 


ei ei 


the filter achieves 


error spectrum by 10 dB, it can be seen that the area under the curve is larger, that 
is, the mean square error increases. Consequently, some intermediate value of y 
may need to be considered to trade off peak error (spectrum) and average error 
performance. 


“Tf the uncertainty is larger than the effect, the effect itself becomes moot.” Patrick Frank 
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9.2.5 Accommodating Uncertainty 


The above filters are designed for situations in which the inputs v(¢) and w(é) are 
uncertain. Next, problems in which model uncertainty is present are discussed. 
The described approaches involve converting the uncertainty into a fictitious noise 
source and solving an auxiliary H. filtering problem. 


Fig. 3. Representation of additive model Fig. 4. Input scaling in lieu of a problem that 
uncertainty. possesses an uncertainty. 


9.2.5.1 Additive Uncertainty 


Consider a time-invariant output estimation problem in which the nominal model 
is G, + A, where GZ, is known and A is unknown, as depicted in Fig. 3. The p(t) 
represents a fictitious signal to account for discrepancies due to the uncertainty. It 
is argued below that a solution to the H. filtering problem can be found by solving 
an auxiliary problem in which the input is scaled by ¢ € R as shown in Fig. 4. In 
lieu of the filtering problem possessing the uncertainty A, an auxiliary problem is 
defined as 


X(t) = Ax(t) + Bw(t)+ Bp(t), x(0) =0, (27) 
z(t) =C,()x(t)+ (0), (28) 
VEN =CMxO)- COX), (29) 


where p(?) is an additional exogenous input satisfying 


2 
2? 


pl, <o?|wh,, oe R. (30) 


Consider the scaled H.. filtering problem where 


X(t) = Ax(t)+ Bew(t), x(0) =0, (31) 
z(t) = C,(t)x(t)+ v2), (32) 
WED =GOxO)-CGOxo, (33) 


“A theory has only the alternative of being right or wrong. A model has a third possibility - it may be 
right but irrelevant.” Manfred Eigen 
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in which e =(1+ 6)". 


Lemma 3 [26]: Suppose for a y # 0 that the scaled H» problem (31) — (33) is 
solvable, that is, lea < (Er pf, + lI). Then, this guarantees the 


performance 


Lah <7? bab + lel + IM) Se 


for the solution of the auxiliary problem (27) — (29). 


Proof: From the assumption that problem (31) — (33) is solvable, it follows that 
|) < V(Er pf, + >|), Substituting for ¢, using (30) and rearranging yields 
(34). 


9.2.5.2 Multiplicative Uncertainty 


Next, consider a filtering problem in which the model is G(J + A), as depicted in 
Fig. 5. It is again assumed that G and A are known and unknown transfer function 
matrices, respectively. This problem may similarly be solved using Lemma 3. 
Thus a filter that accommodates additive or multiplicative uncertainty simply 
requires scaling of an input. The above scaling is only sufficient for a Hx 
performance criterion to be met. The design may well be too conservative and it is 
worthwhile to explore the merits of using values for 6 less than the uncertainty’s 
assumed norm bound. 


9.2.5.3 Parametric Uncertainty 


Finally, consider a time-invariant output estimation problem in which the state 
matrix is uncertain, namely, 


x(t) =(A+A,)x(t)+ Bw(), x(0) =0, (35) 
z(t) = C,(t)x(4)+ v(t), (36) 
MED =COxO-CO%*O, (37) 


where Ay € R”™” is unknown. Define an auxiliary H. filtering problem by 


X(t) = Ax(t)+ Bw(t)+ p(t), x(0) = 0, (38) 


(36) and (37), where p(t) = Ax x(f) is a fictitious exogenous input. A solution to 
this problem would achieve 


“Remember that all models are wrong; the practical question is how wrong do they have to be to not be 
useful.” George Edward Pelham Box 
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lal <2? dha +l +m) (9) 
for a y # 0. From the approach of [14], [18], [19], consider the scaled filtering 
problem 


i(t) = Ax(t)+ Brit), x(0) =0, (40) 


w(t) 


(36), (37), where B =[B e"], mH = 3 
Ep 


and 0 < ¢ < 1. Then the 


solution of this H.. filtering problem satisfies 


lah <7 dbab +2" lel, +Ivl2)- (41) 


which implies (39). Thus, state matrix parameter uncertainty can be 
accommodated by including a scaled input in the solution of an auxiliary H. 
filtering problem. Similar solutions to problems in which other state-space 
parameters are uncertain appear in [18], [19], [14]. 


9.2.6 Continuous-Time H~ Smoothing 


9.2.6.1 Background 


There are three kinds of H. smoothers: fixed point, fixed lag and fixed interval 
(see the tutorial [13]). The next development is concerned with continuous-time 
H.. fixed-interval smoothing. The smoother in [10] arises as a combination of 
forward states from an H. filter and adjoint states that evolve according to a 
Hamiltonian matrix. A different fixed-interval smoothing problem to [10] is found 
in [16] by solving for saddle conditions within differential games. A summary of 
some filtering and smoothing results appears in [13]. Robust prediction, filtering 
and smoothing problems are addressed in [22]; the H. predictor, filter and 
smoother require the solution of a Riccati differential equation that evolves 
forward in time, whereas the smoother additionally requires another to be solved 
in reverse-time. Another approach for combining forward and adjoint estimates is 
described [32] where the Fraser-Potter formula is used to construct a smoothed 
estimate. 


Continuous-time, fixed-interval smoothers that differ from the formulations within 
[10], [13], [16], [22], [32] are reported in [34] — [35]. A robust version of [34] — 
[35] appears in [33], which is described below. 


“The purpose of models is not to fit the data but to sharpen the questions.” Samuel Karlin 
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Fig. 5. Representation of multiplicative model Fig. 6. Robust smoother error structure. 
uncertainty. 


9.2.6.2 Problem Definition 


Once again, it is assumed that the data is generated by (14) — (17). For 
convenience, attention is confined to output estimation, namely G = G within 


Fig. 1. Input and state estimation problems can be handled similarly using the 
solution structures described in Chapter 6. It is desired to find a fixed-interval 
smoother solution # that produces estimates j,(f|7) of y,(¢) so that the output 


estimation error 
J(t|T) =y,O-3,C|T) (42) 
v 
is in £2. As before, the map from the inputs ij = to the error is denoted by 
Ww 


R,,= [H  HG,-G,| and the objective is to achieve (i Pp (t|T)Pt|T) dt — 


ses 3 
r\,? (t)i(t) dt <Oforsome y e R. 


9.2.6.3 Hx Solution 


He following H.. fixed-interval smoother exploits the structure of the minimum- 
variance smoother but uses the gain (21) calculated from the solution of the 
Riccati differential equation (22) akin to the H. filter. An approximate Wiener- 


Hopf factor inverse, A~', is given by 


es | 7 2 ee -K(NC) Kt) \* | ‘i 
a(t) -RM CH R'?O|L zo | 


An inspection reveals that the states within (43) are the same as those calculated 


by the H., filter (19). The adjoint of A”', which is denoted by A~”, has the 
realisation 


(43) 


“Certainty is the mother of quiet and repose, and uncertainty the cause of variance and contentions.” 
Edward Coke 
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ee : ig —CTKM) -ChOR Ol | (44) 
Be) KM) RO) jLa(e) 


Output estimates are obtained as 


VET) = 2(t)- ROBO. (45) 


However, an additional condition requires checking in order to guarantee that the 
smoother actually achieves the above performance objective; the existence of a 


solution P,(t) = P/(t) > 0 is required for the auxiliary Riccati differential 
equation 


-P(t)= AP) + PHA" ()+ KOR? OK" (0) 
+72 P, (ACT (DRME)(CPOC (+ RO)RUODCOP,(), 
PT) =0, (46) 


where A(t) = A(t)—K(t)C(t). 


9.2.7 Performance 
It will be shown subsequently that the robust fixed-interval smoother (43) — (45) 


has the error structure shown in Fig. 6, which is examined below. 


Lemma 4 [33]: Consider the arrangement of two linear systems f = 7,i and u = 


Vv 


Rj shown in Fig. 6, in which i-|" and j= el Let #,, denote the map 


from i toy. Assume that w and v € £2. If and only if: (i) 7, © Lx and (ii) Re, 
€ Lz, then (i) fu, y © Liand (ii) WR, € Le. 


Proof: (i) To establish sufficiency, note that lil, < Im], + I, =>de, 
which with Condition (i) => f € ZL. Similarly, Al, < If, + vl, =>jel, 
which with Condition (ii) => u € L>. Also, rl, < fll, + lull, => y EL. The 


necessity of (i) follows from the assumption i € L together with the property 


“For, Mathematical Demonstrations being built upon the impregnable Foundations of Geometry and 
Arithmetick, are the only Truths, that can sink into the Mind of Man, void of all Uncertainty; and all 
other Discourses participate more or less of Truth, according as their Subjects are more or less capable 
of Mathematical Demonstration.” Sir Christopher Wen 
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R, L2 © L2 => BR, © Lx (see [p. 83, 21]). Similarly, j € Lz together with the 
property RP! Ly < Ly => Re, € Lo. 


Wy 


(ii) Finally, i € L y = i © L2 together with the property 7,,£2 © £2 => 
Ry, € Leo. 


It is easily shown that the error system, Z%,, 


ei ? 


(17) and the smoother (43) — (45), is given by 


for the model (14) — (15), the data 


¥(t) A(t) 0 B(t) -K(t) 
. a t 
-E(t) |=|-C'@R'OCH =A" () 0 -C’(R'(d) ; 
x r vate) (47) 
H(t | T) C(t) RWK'(t) 0 0 ae 


x(0)=0, o(7)=0, 


where xX(t|¢) = x(t) — X(t|t). The conditions for the smoother attaining the 
desired performance! objective are described below. 

Lemma 5 [33]: In respect of the smoother error system (47), if there exist 
symmetric positive define solutions to (22) and (46) for y, y, > 0, then the 
smoother (43) — (45) achieves FR, 


yi 


€ Ly, that is, ie L2 implies y © Lp. 


Proof: Since x(t|t) is decoupled from ¢(t), 7, is equivalent to the arrangement 
of two systems Fe, and Re: shown in Fig. 6. The 7e,, is defined by (23) in which 


C2(t) = C(t). From Lemma 2, the existence of a positive definite solution to (22) 
implies 7, € Lx. The Re, is given by the system 


-§(r) = A" (t)E(x)-C" (2)R"(2) (| 7) -C" (ZR (Vz), E(T) =0, (48) 
u(t) = R(t)K" (t)E(t). (49) 


For the above system to be in Le, from Lemma 4, it is required that there exists a 
solution to (46) for which the existence of a positive definite solution implies 7" 


iy 
€ Ly. The claim #,, € Ly follows from Lemma 4. 

The H. solution can be derived as a solution to a two-point boundary value 
problem, which involves a trade-off between causal and noncausal processes (see 
[10], [15], [21]). This suggests that the H. performance of the above smoother 
would not improve on that of the filter. Indeed, from Fig. 6, » =/+ u and the 


“Doubt is uncomfortable, certainty is ridiculous.” Frangois-Marie Arouet de Voltaire 
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triangle rule yields Ix, < | tf I, + lu ,» where fis the H.. filter error. That is, the 


error upper bound for the H. fixed-interval smoother (43) — (45) is greater than 
that for the H. filter (19) — (20). It is observed below that compared to the 
minimum-variance case, the H.. solution exhibits an increased mean-square error. 


Lemma 6 [33]: For the output estimation problem (14) — (18), in which C2(t) = 
C(t) = C(t), the smoother solution (43) — (45) results in 


(50) 


Iz vO Vi Re, 2 250 > |R, Woy Re |, 2 = 


Proof: By expanding RR, and completing the squares, it can be shown that 
RR = RPGs + FeoPG» where RoR = GOOG" - 
HA — GOANG"A" = [H -—I+ 


GOOG A "A'GONHG" and RF, 
R(t(AA") JA. Substituting H =I -— R(t(AA")" into R,,, yields 


vil ~~ 
7 


Ry = RDU(AA“)' -(AA") "JA, (51) 
which suggests A = CZK (R(t) + R'?(t), where G, denotes an operator 
having the state-space realisation ia Al Constructing AA® = 
COA[KMORMK' (th) — POA - AOPOIG"C'(O + RW and using 
(22) yields AA" = CHG [B(A)O(t)B’ (t) - Pt) + 
xy POC’ ODCHOPOIGC'() + Rit). Comparison with AA" = 
CHABHOANB’ HGNC" (t) + Rip leads to AA" = AA" — CHG, (PO) + 
y°P(t)C ()C(NP(t) GC" (t) . Substituting for AA" into (51) yields 


Ri, = R(N[(AA")' -(AA" —C(NG(PO 
— FPOCTOCOPY)G'C WO) A. (52) 
The observation (50) follows by inspection of (52). 


Thus, the cost of designing for worst case input conditions is a deterioration in the 
mean peer Note that the best possible average performance 
RoR, =[R 


jin Vi 


He ee » can be attained in problems where there are no uncertainties 


“T am not accustomed to saying anything with certainty after only one or two observations.” Andreas 
Vesalius 
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present, y~=0 and the Riccati equation solution has converged, that is, 


P(t) =0, in which case AA” = AA” and Fe, is a Zero matrix. 


9.2.8 Performance Comparison 


It is of interest to compare to compare the performance of (43) — (45) with the H. 
smoother described in [10], [13], [16], namely, 


eae ACO) eee el 0 jx 
s) | LOCOROCO A) G0) | [CORO] 


(53) 
X(t] T) = xX) + PIS) (54) 


and (22). Substituting (54) and its differential into the first row of (53) together 
with (21) yields 


X(t) = ADS) + KOEO- CORO), (55) 


which reverts to the Kalman filter at y~ = 0. Substituting €(t) = P'(t)((t|T) 
— X(t)) into the second row of (53) yields 


X(t|T) = ADK) + GOEC|T)-&O), (56) 


where G(t) = B(t)O(t)B"(t)P,'(t), which reverts to the maximum-likelihood 
smoother at y” = 0. Thus, the Hamiltonian form (53) — (54) can be realised by 


calculating the filtered estimate (55) and then obtaining the smoothed estimate 
from (56). 


Rape aS | eGo yest Saar 
‘xample: , Let A = ,B=C=Q= ,D= and R = 
. 0 -1 01 0 0 


2 
o, 0 lhe ‘ oa as 
i denote time-invariant parameters for an output estimation problem. 
oO 


Simulations were conducted for the case of T = 100 seconds, 6t = / millisecond, 
using 500 realisations of zero-mean, Gaussian process noise and measurement 
noise. The resulting mean-square-error (MSE) versus signal-to-noise ratio (SNR) 
are shown in Fig. 7. The H.~ solutions were calculated using a priori designs of 


“We know accurately only when we know little, with knowledge doubt increases.” Johann Wolfgang 
von Goethe 
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y° = o, within (22). It can be seen from trace (vi) of Fig. 7 that the Hx 


smoothers exhibit poor performance when the exogenous inputs are in fact 
Gaussian, which illustrates Lemma 6. The figure demonstrates that the minimum- 
variance smoother out-performs the maximum-likelihood smoother. However, at 
high SNR, the difference in smoother performance is inconsequential. 
Intermediate values for y~ may be selected to realise a smoother design that 
achieves a trade-off between minimum-variance performance (trace (iii)) and H. 
performance (trace (v)). 


-30 T T , T 0 

-35 ) 5 0) 
“ (ii) 
wg a (wv) 2 10 a 

ul (vi) ul -15 
S -45 = : (iv) 
my “20 (iii) 

-60 (ii -25 
55 = -30 bu 

0 5 0 5 10 10 +5 0 5 10 
SNR, dB SNR, dB 


Fig. 7. Fixed-interval smoother performance 
comparison for Gaussian process noise: (i) 
Kalman filter; (ii) Maximum likelihood 
smoother; (iii) Minimum-variance smoother; (iv) 
H., filter; (v) H. smoother [10], [13], [16]; and 
(vi) H. smoother (43) — (45). 


Fig. 8. Fixed-interval smoother performance 
comparison for sinusoidal process noise: (1) 
Kalman filter; (ii) Maximum likelihood 
smoother; (iii) Minimum-variance smoother; (iv) 
H., filter; (v) H. smoother [10], [13], [16]; and 
(vi) H, smoother (43) — (45). 


Example 3 [35]. As in Example 6.6, consider the non-Gaussian process noise 


where o-,,,. denotes the sample variance of sin(¢). The 


signal w(t) = sIN(AO 4) : sin(t) 
results of a simulation study appear in Fig. 8. It can be seen that the H. solutions, 
which accommodate input uncertainty, perform better than those relying on 
Gaussian noise assumptions. In this example, the developed H. smoother (43) — 


(45) exhibits the best mean-square-error performance. 


“Inquiry is fatal to certainty.” William James Durant 
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9.3. Robust Discrete-Time Estimation 


9.3.3. Discrete-Time Bounded Real Lemma 


The development of discrete-time H.. filters and smoothers proceeds analogously 
to the continuous-time case. From Lyapunov stability theory [36], for the unforced 
system 


Xeut = AX,» (57) 


Ax € R””, to be asymptotically stable over the interval k € [1, N], a Lyapunov 
function, V(xx), is required to satisfy AV,(x,)<0, where AVi(xy) = Vieian) — 
Vidxx) denotes the first backward difference of Vi(xx). Consider the candidate 
Lyapunov function V,(x,) =x, P.x,, where P, = P’ € R”” is positive definite. 
To guarantee x; € @,, it is required that 


AV (4) =%altin tm, <0: (58) 


Now let y= G w denote the output of the system 


X41 = AX, + BLY, , (59) 


y, =C.%; 5 (60) 


where w; € IR”, Bee R”” and €¢ R””. 
The Bounded Real Lemma [18] states that w € @, impliesy € @, if 
Sih aes +N, — 7 WW, <0 (61) 


foray € R. Summing (61) from k = 0 to k= N—1 yields the objective 


N-1 N-1 
=xp PX + LYM - VY DWM, <0, (62) 
k=0 k=0 
that is, 
T or 
=x) PX) + DY; Ve (63) 
k=0 2 
W-1 <7. 
ww, 
k=0 


“Education is the path from cocky ignorance to miserable uncertainty.” Samuel Langhorne Clemens 
aka. Mark Twain 
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Assuming that xo = 0, 


WY 
pop — bile Me as 


</. 
bl, Dt r 


Conditions for achieving the above objectives are established below. 


Lemma 7: The discrete-time Bounded Real Lemma [18]: In respect of the above 
system G , suppose that the Riccati difference equation 


Po = ALP A+ 1 AL PB, Bi Pe By) By Poy +E (65) 


+1 


with Pr = 0, has a positive definite symmetric solution on [0, N]. Then IF, <y 


for anyw é €,. 
Proof: From the approach of Xie et al [18], define 


Py = Wy a (J <7 B, PB.) BP, A,X, . (66) 


+1 


It is easily verified that 


T T T 2.7 
Xp P es X esr — Xp AX, + Ve Ve VW Wy 


=—y" py By PBy) | Dy —%4 Ag Pe Ay» 


+1 
which implies (61) — (62) and (63) under the assumption xo = 0. 


The above lemma relies on the simplifying assumption E{w We } = 16, . When 
Etww.} = Q,6,, the scaled matrix B, = B,Q;” may be used in place of By 


above. In the case where & possesses a direct feedthrough matrix, namely, y; = 
Cixx + Dewi, the Riccati difference equation within the above lemma becomes 


P= APA + OG, 
+7 (Ap PB, + Cp DL - 7° By FB, — Dy Dy) (Bi Per Ae + DEC). (67) 
A verification is requested in the problems. It will be shown that predictors, filters 


and smoothers satisfy a H. performance objective if there exist solutions to 
Riccati difference equations arising from the application of Lemma 7 to the 


“And as he thus spake for himself, Festus said with a loud voice, Paul, thou art beside thyself; much 
learning doth make thee mad.” Acts 26: 24 
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corresponding error systems. A summary of the discrete-time results from [5], 
[11], [13] and the further details described in [21], [30], is presented below. 


9.3.4 Discrete-Time Ho Prediction 
9.3.4.1 Problem Definition 


Consider a nominal system @, 


Xpy1 = AX, + BW, (68) 
(69) 


Von = Cap Xe > 


together with a fictitious reference system G@, realised by (68) and 


Vik = CX > (70) 


where A;, By, C2x and C;, are of appropriate dimensions. The problem of interest 
is to find a solution A that produces one-step-ahead predictions, ,,,,_,, given 


measurements 
Zp = Von + Vy (71) 


at time 4 — 1. The prediction error is defined as 


Vina = Vuk = Ves . (72) 


The error sequence (72) is generated by y = #,i, where #,= 


vi 


y N-1 
fH HG,-Gl,i= and the objective is to achieve > Viviane - 
W 


k=0 


N-1 
2 -T : erent 

y y i,i, <0, for some y € R. For convenience, it is assumed that wz € R”, 
k=0 


Etw,} = 0, Etw,w; } = 00,, vee KR’, Ey} = 0, Evy} = R,6, and 
Etw,y, } =0. 


9.3.4.2 Hx Solution 


The H. predictor has the same structure as the optimum minimum-variance (or 
Kalman) predictor. It is given by 


“Why waste time learning when ignorance is instantaneous?” William Boyd Watterson II 
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pak = (4, -K,C,, Ves + Ky2Zq; (73) 
(74) 


Vikik-1 = Cis XK ik-1> 


where 
i, rw aaa OF (CVPR pir +R, yt (75) 
is the one-step-ahead predictor gain, 
Fura = (M;" = ae Sree SW ; (76) 


and M, = M{ > 0 satisfies the Riccati differential equation 


My = AMA, +BO,B, 
a 
CMC, -y'l CM Ce, | 


T T M, Al 
Cy M,C), R, + Cy MCs, Cy 


-A.M, [ee Cri | 
(77) 

C,,M,Cl, -y'l CMG, 
CM, Cy R, ay C, MM, Ci, 
known as an a priori filter within [11], [13], [30]. 


such that |ro The above predictor is also 


9.3.4.3 Performance 


Following the approach in the continuous-time case, by subtracting (73) — (74) 
from (68), (70), the predictor error system is 


ts Xe 
ea 2 a -K,C,, [-K, B, ’ Fr % =0 
Vere Chix | ES em 
k 


=R.i, (78) 


A,—K,C -K, B 
Wherein 2.5 ee Se ene | ee CA, BI andi= | |. Itis 
Cx [0 OJ w 


shown below that the prediction error satisfies the desired performance objective. 


“Give me a fruitful error any time, full of seeds bursting with its own corrections. You can keep your 
sterile truth for yourself.” Vilfredo Federico Damaso Pareto 
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Lemma 8 [11], [13], [30]: In respect of the H. prediction problem (68) — (72), the 


existence of M, = M| > 0 for the Riccati differential equation (77) ensures that 
N-1 


the solution (73) — (74) achieves the performance objective ee — 
k=0 
N-1 
Oe bie <0) 
k=0 


Proof: By applying the Bounded Real Lemma to R, and taking the adjoint to 
address FR, 


yi? 


it is required that there exists a positive define symmetric solution to 


Pan = (A, -K,CL DP, (A, -K,C,,)° +K,R,K; +B,O, Bi 
+> (A, -K,C,, eC AU -¥°C,,P.Ch, J CaP (A, -K,C,, id 
= (A, -K,C,, MA, + yo PC Ce a de OTe ) Cy PMA; 2165). 


+K,R,K, +B,O,B, 
= (A, a K,C,, Cs a yc men ie (A, sa K, Coy x Be KR + BOB. 
(79) 
in which use was made of the Matrix Inversion Lemma. Defining P.,,., = (P.' + 


VC, Cig): leads to 


(Pai + yc ides = 
7 (A, -K,C,, Pri Ay —K,C,,)° +K,R,K; +B,O,B, 
= APA, +B,0,Br = APs, (R + CPC. ia O° area ae 


and applying the Matrix Inversion Lemma gives 


(Biv a8 eas Oxtte Shr we ae 
= AcPtaAy, + BOB, i. AF 554 (R+ Cpe me Shre sarey 
= A, (Pus + Gi Re Cay) AL + BOB, : 


The change of variable (76), namely, Pj, = M,' — y°C,,C,,, results in 


Min =A (M;' + Ga ik, Gs -¥°CL.Cyu) At +B,O,B, 
= A(M,'+C,R,C])' A; + B,O,Bi . (80) 


“Never interrupt your enemy when he is making a mistake.” Napoléon Bonaparte 
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a | Ge ae -y*I 0 : . : 
where C, = C and R, = A . . Applying the Matrix Inversion Lemma 
k 


2k 


within (80) gives 


Mi. = A, MA; —A,.M,C; (R, +C,M,Cl)'C,M, A +BO.B, ? (81) 
Expanding (81) yields (77). The existence of Mz > 0 for the above Riccati 


differential equation implies P; > 0 for (79). Thus, it follows from Lemma 7 that 
the stated performance objective is achieved. 


9.3.5 Discrete-Time H. Filtering 

9.3.5.1 Problem Definition 

Consider again the configuration of Fig. 1. Assume that the systems G, and @, 
have the realisations (68) — (69) and (68), (70), respectively. It is desired to find a 


solution A that operates on the measurements (71) and produces the filtered 
estimates ), ,,,- The filtered error sequence, 


Vue = Vuk =) ee > (82) 


is generated by y = #,i, where R= [H HG,-G],i= Mi The H. 
WwW 


N-1 N-1 
a 7 A yi ~T ~ 2 Tee 
performance objective is to achieve y Vii Ven — 7 y i,i, <0, forsomeye R. 
k=0 k=0 


9.3.5.2 Hx» Solution 
As explained in Chapter 4, filtered states can be evolved from 


Ke = Aga % ana thy Z -— Cy Apa Xena)» (83) 


where Ly € R”” is a filter gain. The above recursion is called an a posteriori 
filter in [11], [13], [30]. Output estimates are obtained from 


Druz-ve-t = Ci aiMe tie . (84) 


The filter gain is calculated as 


“T believe the most solemn duty of the American president is to protect the American people. If 
America shows uncertainty and weakness in this decade, the world will drift toward tragedy. This will 
not happen on my watch.” George Walker Bush 
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L, =M,C, (C,M,C; +R)" > (85) 


where M, = M/ > 0 satisfies the Riccati differential equation 


M, = A, .M, 4.1 + By .Qe Bea = A.M, em Cpl 


+1 
ae yl CM Ca | Ps |e (86) 
Cop MiG Ra ty MiG Coa 


such that eae ayel CM iC ei | >0. 


T T 
Cop Mai Re + Ch aM Oo 


9.3.5.3 Performance 

Subtracting from (83) from (68) gives X,,, = Ay Xpayn-) + Bey - Ap keane 
‘ ; v 

+ LC, Ag Xan +L, (Cy, (A, %%4. + Biwi) + v,). Denote ix = : | 


Wr 


then the filtered error system may be written as 


ek 2 U-L,C,, A, [-L, U-L,C,, )B,-1] Kk 
Frail LC [0 0] 


I 


I-L,C,,)A,_., [-L I-L,C,,)B 
with %, = 0, where R,, = i k 24) pa lobe 4 k 2k) aA teas 


Cx [0 0] 


shown below that the filtered error satisfies the desired performance objective. 


Lemma 9 [11], [13], [30]: In respect of the H.» problem (68) — (70), (82), the 
N-1 N-1 
solution (83) — (84) achieves the performance Sve — aan < 0. 
k=0 


k=0 


Proof: By applying the Bounded Real Lemma to RF, and taking the adjoint to 
address FR, 


yi? 


it is required that there exists a positive define symmetric solution to 


“Hell, there are no rules here — we’re trying to accomplish something.” Thomas Alva Edison 
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Ps a (-L,C,, )A,P An (-C,,L,) 


ty (I -LC, JAAP. uF - 7° CyU PCy Cy RA - LO.) 
+U-L, Che )B,,O,_,B ia (U- CL) + LRT 
=(I-1,G,)4.(R - CCA G-Ci) oe) 
+U-L, Ch, )B,,O,_,B ne U- Cr) + LR, 5 
in which use was made of the Matrix Inversion Lemma. Defining 
Pies = Pe = Weis , (89) 
using (85) and applying the Matrix Inversion Lemma leads to 
ae + ag Os Cy 
=(1-L,C,, (Ap Pi 4e + BQ, Bp) - Cy E+ £85, 
= U a LC, , )M , Ud pe Ck) + Lonel 
= M, -M,C, (R, + Cle y'C, My, 2) 
=(My'+ Cy, RCo) 
where 
M, = Ai Pies +85 0-48 4 : (91) 


It follows from (90) that Po, + ¥°C/,.Cy = M;' + Cl,R,'C,, and 


re T -1 Ar 
Fora =My+ Co Re Cos = Chaat 


= Me a CaR Ge ’ (92) 
= Cy — |-y°I 0 ae ; : 
where C, = C and R, = ‘ aay Substituting (92) into (91) yields 
2,k k 
M, = A,, (M;, + Cui R aC. Nae a + By 10.7; 4 ? (93) 


which is the same as (86). From (77), the existence of M; > 0 for the above Riccati 
difference equation implies the existence of a P; > 0 for (88). Thus, it follows from 
Lemma 7 that the stated performance objective is achieved. 


“T can live with doubt and uncertainty. I think it’s much more interesting to live not knowing than to 
have answers which might be wrong.” Richard Phillips Feynman 
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9.3.6 Solution to the General Filtering Problem 


Limebeer, Green and Walker express Riccati difference equations such as (86) in 
a compact form using J-factorisation [5], [21]. The solutions for the general 
filtering problem follow immediately from their results. Consider 


Xe A, Bux 0 xX; 

Vere = Cie Die Doe i . (94) 

Zk Chie Dy. 0 Dr eit | 

ee ng } ul c= | Boe (Pat: Pies | cel ee = 
0 =i Chik | Paik 0 

[ Biss 0]. From the approach of [5], [21], the Riccati difference equation 


corresponding to the H.« problem (94) is 
Miu =A,M,A, + B,J, Bi 
(95) 
-(A,M CF + B,J,Di (CM Ce +D,J,Di) '(AM,Cr +B,J,Di)" s 
Suppose in a general filtering problem that G, is realised by (68), ¥., = Cy 4%; 
+ D,,.w,, G, is realised by (68) and y,, = C,.x, + D,,w,. Then substituting 
Bus = [0 B, |. Cis = Cas Coin = Cris Die - [0 Diels Diag =I and 
Drip = [1 Dy, | into (95) yields 
AM,C,,+B,0,D, 
Ma = A.M ,A, lee 2 cee | 
k ~Co4 + Di 24k 
i Gi, +D,,0,Di, -y I CMC, +D,,0,D>, 
CC +D,,O,Dii R, + CMC +D,,O,D5, 
x| CM, Ap +D,,O,Bi, C,,M,A, +D,,0,B; | +B.0,B) . (96) 


The filter solution is given by 


Reig a AS ipa +K,(z,- Crea) > (97) 
Dini = Cities +1, (z, - Crate) > (98) 


“If we knew what it is we were doing, it would not be called research, would it?” Albert Einstein 
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where Ky = (A.M .C), + B:O.D; JO} [p= (C,,M,Cyy ae D.0.D, 50, and 
Q= CMC: + Dd, ,0-D5 + Re. 


9.3.7 Discrete-Time H~ Smoothing 


9.3.7.1 Problem Definition 


Suppose that measurements (72) of a system (68) — (69) are available over an 
interval k € [1, N]. The problem of interest is to calculate smoothed estimates 
Vi Of y, such that the error sequence 


Yun = Va — Dun (99) 


isin @,. 
9.3.7.2 Hx» Solution 


The following fixed-interval smoother for output estimation [28] employs the gain 
for the H.. predictor, 


co AP sCz,Qy > (100) 


where Q¢ = C,,P.,;C, + Rs in which P.,,_, is obtained from (76) and (77). The 


gain (100) is used in the minimum-variance smoother structure described in 
Chapter 7, viz., 


Kesaie _ A,-K,C,, K, Kei (101) 
a, =O,°C,, Q;"" Zp | 
Sf CRT CEO WE] 2 (102) 
B, aK" q? Or, 2ON ? 
pe 103 
Yun = 2, ~RB, - (105) 


It is argued below that this smoother meets the desired H.. performance objective. 


9.3.7.3 H~ Performance 


It is easily shown that the smoother error system is 


“T have had my results for a long time: but I do not yet know how I am to arrive at them.” Karl 
Friedrich Gauss 
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x 
Xeaise A, ~K,O,, 0 -K, B, i 
A, |= CG OFGs Ar -C,, Ky Ne r 0 ; 
y re T -1 Vy 
kin [RQ Cy y -R,K, | [ RQ; _] 0| : 
= Fil ; (104) 
i x = a v 
with x, =0, where X,,,.) = x, — X44.7= an 
w 
A, -K,C,, 0 -K, B, 
R= CoO; Cg A, HC clos 0 


[R.2;'C, -R.Ki]  [R.Q;'-7 0] 


Lemma 10: In respect of the smoother error system (104), if there exists a 
symmetric positive definite solutions to (77) for y > 0, then the smoother (101) — 


(103) achieves Fe,, € ¢,,, that is,i € ¢, impliese € ¢,. 


Outline of Proof: From Lemma 8, x € £,, since it evolves within the predictor 
error system. Therefore, A € £,, since it evolves within the adjoint predictor 


error system. Then y € £,, since it is a linear combination of X, A andi € ¢,. 


9.3.7.4 Performance Comparison 


Example 4 [28]. A voiced speech utterance “a e i 0 u” was sampled at 8 kHz for 
the purpose of comparing smoother performance. Simulations were conducted 
with the zero-mean, unity-variance speech sample interpolated to a 16 kHz sample 
rate, to which 200 realisations of Gaussian measurement noise were added and the 
signal to noise ratio was varied from -5 to 5 dB. The speech sample is modelled as 
a first-order autoregressive process 


Xp, = AX, +W, (105) 


where 4 ¢ R,0<A <1. Estimates for o> and A were calculated at 20 dB SNR 
using an EM algorithm, see Chapter 8. 


“If I have seen further it is only by standing on the shoulders of giants.” Zsaac Newton 
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5 
x 
x 
x (i) 
— 0 es 
0 x 
TT : (ii) x 
7) SiS ge 
= 5 eee ie x 
~(i) ~~ Lee. 
>: (iW > 4 
(v) SN. 
-10 : 
-5 0 5 


SNR 


Fig. 9. Speech estimate performance comparison: (1) data (crosses), (ii) Kalman filter (dotted line), (iii) 
H_,, filter (dashed line), (iv) minimum-variance smoother (dot-dashed line) and (v) H,, smoother 


(solid line). 


Simulations were conducted in which a minimum-variance filter and a fixed- 
interval smoother were employed to recover the speech message from noisy 
measurements. The results are provided in Fig. 9. As expected, the smoother out- 
performs the filter. Searches were conducted for minimum values of y such that 
solutions to the design Riccati difference equations were positive definite for each 
noise realisation. The performance of the resulting H. filter and smoother are 
indicated by the dashed line and solid line of the figure. It can be seen for this 
example that the H.. filter out-performs the Kalman filter. The figure also indicates 
that the robust smoother provides the best performance and exhibits about 4 dB 
reduction in mean-square-error compared to the Kalman filter at 0 dB SNR. This 
performance benefit needs to be reconciled against the extra calculation cost of 
combining robust forward and backward state predictors within (101) — (103). 


9.3.7.5 High SNR and Low SNR Asymptotes 


An understanding of why robust solutions are beneficial in the presence of 
uncertainties can be gleaned by examining single-input-single-output filtering and 
equalisation. Consider a time-invariant plant having the canonical form 


0 1 0 0 
0 eh 0 : 
A= : » B=) 1, G=l[e 0: 0]. 
|-a a, Oy 1 
do, ... An-1,c € R. Since the plant is time-invariant, the transfer function exists and 


is denoted by G(z). Some notation is defined prior to stating some observations for 
output estimation problems. Suppose that an H.. filter has been constructed for the 


“In computer science, we stand on each other's feet.” Brian K. Reid 
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above plant. Let the H. algebraic Riccati equation solution, predictor gain, filter 
gain, predictor, filter and smoother transfer function matrices be denoted by P’, 
K™, LHS? (z), HE?(z) and H{&(z) respectively. The H,, filter transfer 
function matrix may be written as H\(z) = L® +(W-— L”)H®(z) where 
L® =I— R(Q™)". The transfer function matrix of the map from the inputs to 
the filter output estimation error is 


RO (2) =[HO (20, (AE (z)-DG(z)e,,]. (106) 


The H. smoother transfer function matrix can be written as H{(z) = I - 
RU -(A®)(z))" (QM) "(1-H (2). Similarly, let P, K, L, H(z) 
and H(z) denote the minimum-variance algebraic Riccati equation solution, 


predictor gain, filter gain, filter and smoother transfer function matrices 
respectively. 


Proposition I [28]: In the above output estimation problem: 


(i) 
lim sup |Hy?(e/)|=1. (107) 
o 0 we{-2,2} 
(ii) 
lim sup JAP (e”)|< lim sup He). (108) 
oO; 70 wef— n,m} o 0 we{-2,7} 
(iii) 
lim sup [RO (RS)" (e”)| = lim sup HPO ee, (109) 
3; 90 wef—z,7} 730 oe{-7,7} 
(iv) 
lim sup lA) =. (110) 
o 0 we{-2,7} 
(v) 
lim sup JHE (e’”)| < lim sup HE (e?”)|. (111) 


oy 50 wef— a a 0 we{-2,7} 


Outline of Proof: (i) Let Pas denote the (1,1) component of P'. The low 


measurement noise observation (107) follows from L™ =\-o° (ops, +o)! 


which implies lim LOT, 
o, 0 


“All programmers are optimists.” Frederick P. Brooks, Jr 
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(ii) Observation (108) follows from pi{}, > py), which results in lim L® > 


o, 0 
: 2 
lim L. 
a 0 


(iti) Observation (109) follows immediately from the application of (107) in (106). 
(iv) Observation (110) follows from iim o (ep +o7)' =0. 


(v) Observation (111) follows from Pas > Pay which results in lim Of = 


) o, 0 


lim Q”. 


a 0 


An interpretation of (107) and (110) is that the maximum magnitudes of the filters 
and smoothers asymptotically approach a short circuit (or zero impedance) when 


o. — 0. From (108) and (111), as o? — 0, the maximum magnitudes of the H., 
solutions approach the short circuit asymptote closer than the optimal minimum- 
variance solutions. That is, for low measurement noise, the robust solutions 
accommodate some uncertainty by giving greater weighting to the data. Since 
S75 Ae 


oO 


lim [Riil, — o, and the H. filter achieves the performance Mi 


a 0 


si 
follows from (109) that an a priori design estimate is y = o,. 

Suppose now that a time-invariant plant has the transfer function 
G(z) = C(zIl — A) 'B+D, where A, B and C are defined above together with D € 
IR. Consider an input estimation (or equalisation) problem in which the transfer 


function matrix of the causal H. solution that estimates the input of the plant is 
HY"(@)=QD" (A) "-QD"(A")" HG). (112) 


The transfer function matrix of the map from the inputs to the input estimation 
error is 


RO @) =A @e, (Af’G2)-De,]. (113) 
The noncausal H , transfer function matrix of the input estimator can be written as 
Hs” (z) = OG" (2) —(Hp”(2))" (QS?) - Hp” (2). 
Proposition 2 [28]: For the above input estimation problem: 
(i) 


lim sup 


oy 90 we{-z,7} 


He (e””)| =0. (114) 


“Always code as if the guy who ends up maintaining your code will be a violent psychopath who 
knows where you live.” Damian Conway 
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(ii) 
lim sup H(e”)|> lim sup Heer), (115) 
oy 90 we{-z,7} o,° 0 we{-2,7} 
(iii) 
lim sup |RY?(R?)"(e””) 
O90 wet—z,2} 
= lim sup (HOG ”)-N AME (e”)-1)"|ox. (116) 
0, 0 we{-z,7} 
(iv) 
lim sup |H,°(e"”)|=0. (117) 
oy 90 we{-z,7} 
(v) 
lim sup HY (e"*)|> lim sup HO (e?”)|. (118) 


0, 90 we{-z,7} 0, 90 we{-z,7} 


Outline of Proof: (i) and (iv) The high measurement noise observations (114) and 
(117) follow from Q® = ODan +do.+o. which implies Jim (Q3?y" = 0). 
(ii) and (v) The observations (115) and (118) follow from roe Days which 
results in lim QY? > lima”. 

o>0 «0 


(iii) The observation (116) follows immediately from the application of (114) in 
(113). 


An interpretation of (114) and (117) is that the maximum magnitudes of the 
equalisers asymptotically approach an open circuit (or infinite impedance) when 
a,” — 0. From (115) and (118), as o* — 0, the maximum magnitude of the Hx 
solution approaches the open circuit asymptote closer than that of the optimum 
minimum-variance solution. That is, under high measurement noise conditions, 
robust solutions accommodate some uncertainty by giving less weighting to the 


data. Since lim [2 — o,,, the Hx solution achieves the performance 
o, 30 ye 


Proposition | follows intuitively. Indeed, the short circuit asymptote is sometimes 
referred to as the singular filter. Proposition 2 may appear counter-intuitive and 
warrants further explanation. When the plant is minimum phase and _ the 
measurement noise is negligible, the equaliser inverts the plant. Conversely, when 
the equalisation problem is dominated by measurement noise, the solution is a low 


R 


<y, it follows from (116) than an a priori design estimate is y =o,,. 


Si 


“Your most unhappy customers are your greatest source of learning.” William Henry (Bill) Gates LI 
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gain filter; that is, the estimation error is minimised by giving less weighting to the 
data. 


9.4 Chapter Summary 


Uncertainties are invariably present within the specification of practical problems. 
Consequently, robust solutions have arisen to accommodate uncertain inputs and 
plant models. The H.. performance objective is to minimise the ratio of the output 
energy to the input energy of an error system, that is, minimise 
sup I, = sup aes < 
Hh<0" fb #0 [lf 


for some y € R. In the time-invariant case, the objective is equivalent to 
minimising the maximum magnitude of the error power spectrum density. 


Predictors, filters and smoothers that satisfy the above performance objective are 
found by applying the Bounded Real Lemma. The standard solution structures are 
retained but larger design error covariances are employed to account for the 
presence of uncertainty. In continuous time output estimation, the error covariance 
is found from the solution of 


P(t) = A(t)P(t)+ P(t)A’ (t) + B(1)O(t)B’ (t) 
-P(K(C’ (DR (OC (Q-7°C (NC (OP) - 


Discrete-time predictors, filters and smoothers for output estimation rely on the 
solution of 


P4= A,P.Ay +B,O,Bt 


cepc-yt cece Tle 
-AP[C Ci Btn —Y aloe k 


T ig |rat ¢ 

CAC, R, + CPC, C 

It follows that the H. designs revert to the optimum minimum-variance solutions 
as y? — 0. Since robust solutions are conservative, the art of design involves 
finding satisfactory trade-offs between average and worst-case performance 
criteria, namely, tweaking the y. 


A summary of suggested approaches for different linear estimation problem 
conditions is presented in Table 1. When the problem parameters are known 
precisely then the optimum minimum-variance solutions cannot be improved 
upon. However, when the inputs or the models are uncertain, robust solutions may 
provide improved mean-square-error performance. In the case of low 
measurement noise output-estimation, the benefit arises because greater weighting 


“A computer lets you make more mistakes than almost any invention in history, with the possible 
exceptions of tequila and hand guns.” Mitch Ratcliffe 
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is given to the data. Conversely, for high measurement noise input estimation, 
robust solutions accommodate uncertainty by giving less weighting to the data. 


PROBLEM CONDITIONS SUGGESTED APPROACHES 

Gaussian process and 1. Optimal minimum-variance (or Kalman) filter. 

measurement noises, known 2"¢- 2. Fixed-lag smoothers, which improve on filter performance 
order statistics. Known system (see Lemma 3 and Example 1 of Chapter 7). They suit on-line 
model parameters. applications and have low additional complexity. A sufficiently 


large smoothing lag results in optimal performance (see Example 
3 of Chapter 7). 

3. Maximum-likelihood (or Rauch-Tung-Striebel) smoothers, 
which also improve on filter performance (see Lemma 6 of 
Chapter 6 and Lemma 4 of Chapter 7). They can provide close to 
optimal performance (see Example 5 of Chapter 6). 

4. The minimum-variance smoother provides the best 
performance (see Lemma 12 of Chapter 6 and Lemma 8 of 
Chapter 7) at the cost of increased complexity (see Example 5 of 
Chapter 6 and Example 2 of Chapter 7). 


Uncertain process and 1. Optimal minimum-variance filter, which does not rely on 

measurement noises, known 2" Gaussian noise assumptions. 

order statistics. Known system 2. Optimal minimum-variance smoother, which similarly does 

model parameters. not rely on Gaussian noise assumptions (see Example 6 of 
Chapter 6). 


3. Robust filter which trades off H., performance (see Lemmas 2, 
9) and mean-square-error performance (see Example 3). 

4. Robust smoother which trades off H.. performance (see 
Lemmas 5, 10) and mean-square-error performance (see 


Example 3). 
Uncertain processes and 1. Robust filter (see Example 4). 
measurement noises. Uncertain 2. Robust smoother (see Example 4). 
system model parameters. 3. Robust filter or smoother with scaled inputs (see Lemma 3). 


Table 1. Suggested approaches for different linear estimation problem conditions.*° 


9.5 Problems 


Problem 1 [31]. 
(i) Consider a system & having the state-space representation x(t) = 
Ax(t) + Bw(t), y(t) = Cx(). Show that if there exists a matrix P = P7 > 
A’P+PA+C’ PB 
0 such that FPAtOC FB | co then x (T)Px(T) — 
BP -y'I 
T(0)P ep gigi 
x" (O)Px(0) + fy (DyOdt < 7° fw" (Owe) at. 
(ii) Generalise (i) for the case where y(t) = Cx(t) + Dw(d). 


“The factory of the future will only have two employees, a man and a dog. The man will be there to 
feed the dog. The dog will be there to keep the man from touching the equipment.” Alice Kahn 
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Problem 2. Consider a system GY modelled by x(t) = A()x(4) + BAW, v(t) = 
C(A)x(t) + D(Hw(d). Suppose that the Riccati differential equation 
—P(1) = PAD + BOM "(YD (NCO) +A) + BM "()D" (YC) PO) 
ty B)M (A) BT (t)+C" (QU + DOM "(t)D" O)C(1), 


M(t) = y°I— D"(t)D(t) > 0, has a solution on [0, 7]. Show that ||| < y for any w 
€ Lo. (Hint: define V(x(t)) = x"(t)P(t) x(t) and show that V(x(t)) + y’(t)y(t) - 
yw" (t)w(t) <0.) 

Problem 3. For measurements z(¢) = y(4) + (4) of a system realised by x(t) = 


A(f)x(t) + B(t)w(t), v(t) = C(Hx(t), show that the map from the inputs i = i to 
w 


the H., fixed-interval smoother error e(t|T) is 


X(t) A(t)— K(#)C(t) 0 B(t) KW) (t) 
-&(t) |=|-C’@R'OCM ATO-C7K™(H 0 -C7@R"'(H) . , 
M(t |T) C(t) R(t)K' (t) 0 0 
v(t) | 
Problem 4. 
(i) For a G modelled by xp+1 = Agxe + Biwe, Ve = Cixr Diywe, Show that 


the existence of a solution to the Riccati difference equation 


P= APA tA, PB — 1 Bi PB) By Pha Ae + GC, 


+1 
: : 7 T T 2 7 eee 
is sufficient for x, Px, —X,..PoiXiu + VV, —Y WW, <0. Hint: construct 


T Yi 
Xp PeaXea1 ~ X;, Px, and show that 


T T T 27 
Xp Pes X esr — Xp AX, + Ve Ve VW Wy 


= -7" Dy (- yB; PB, yr Pr Mi Ayam 9 


where p, =w,—Y "(1-7 Bi PB.) By Ps Ah - 


+1 


N-l N-1 
(ii) Show that —xj Px, + ae - y> ww, <0. 


k=0 k=0 


Problem 5. Now consider the model xi+1 = Auxe + Bews, Ve = Cire + Dawe and show 
that the existence of a solution to the Riccati difference equation 


“On two occasions I have been asked, ‘If you put into the machine wrong figures, will the right 
answers come out?’ I am not able rightly to apprehend the kind of confusion of ideas that could 
provoke such a question.” Charles Babbage 
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F,= A, PA, + GG, 
+7" (A, PB. CG, Day BP By = DD) (Gi PnA, + D.C) 


is sufficient for x; P.x, —X,.,PoaXe tI, —Y WY, <0. Hint: define 
pawn y U-yB, PB.- 4D, Dy (BPA + DiC.) - 


+1 


Problem 6. Suppose that a predictor attains a H. performance objective, that is, 
the conditions of Lemma 8 are satisfied. Show that using the predicted states to 
construct filtered output estimates y,,, results in j,,, = y-— Vpn © &y- 


9.6 Glossary 


on The Lebesgue o-space defined as the set of continuous-time 
systems having finite 0-norm. 
R,, € Loo The map 7,, from the inputs i(¢) to the estimation error y (¢) 


7 T Lp & atl or ‘ 
satisfies f, y (t)y(t)dt — v fi (t)i (t)dt < 0. Therefore, i 


e€ £Lrimplies p € Lo. 


es The Lebesgue o-space defined as the set of discrete-time 
systems having finite 0-norm. 
Re, € The map #,, from the inputs i; to the estimation error , 


N-1 N-1 
satisfies )) 5,-7 DLigi, <0. Therefore, i ¢ ¢, implies 


k=0 k=0 


y € é,. 
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10. Nonlinear Prediction, Filtering and 
Smoothing 


10.1 Introduction 


The Kalman filter is widely used for linear estimation problems where its 
behaviour is well-understood. Under prescribed conditions, the estimated states 
are unbiased and stability is guaranteed. Many real-world problems are nonlinear 
which requires amendments to linear solutions. If the nonlinear models can be 
expressed in a state-space setting then the Kalman filter may find utility by 
applying linearisations at each time step. In the two-dimensional case, linearising 
means finding tangents to the curves of interest about the current estimates, so that 
the standard filter recursions can be employed in tandem to produce predictions 
for the next step. This approach is known as extended Kalman filtering — see [1] — 


[5]. 


Extended Kalman filters (EKFs) revert to optimal Kalman filters when the 
problems become linear. Thus, EKFs can yield approximate minimum-variance 
estimates. However, there are no accompanying performance guarantees and they 
fall into the try-at-your-own-risk category. Indeed, Anderson and Moore [3] 
caution that the EKF “can be satisfactory on occasions”. A number of 
compounding factors can cause performance degradation. The approximate 
linearisations may be crude and are carried out about estimated states (as opposed 
to true states). Observability problems occur when the variables do not map onto 
each other, giving rise to discontinuities within estimated state trajectories. 
Singularities within functions can result in non-positive solutions to the design 
Riccati equations and lead to instabilities. 


The discussion includes suggestions for performance improvement and is 
organised as follows. The next section begins with Taylor series expansions, 
which are prerequisites for linearisation. First, second and third-order EKFs are 
then derived. EKFs tend be prone to instability and a way of enforcing stability is 
to masquerade the design Riccati equation by a faux version. This faux algebraic 
Riccati equation technique [6] — [10] is presented in Section 10.3. In Section 10.4, 
the higher order terms discarded by an EKF are treated as uncertainties. It is 


“Tt is the mark of an instructed mind to rest satisfied with the degree of precision to which the nature of 
the subject admits and not to seek exactness when only an approximation of the truth is possible.” 
Aristotle 
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shown that a robust EKF arises by solving a scaled H. problem in lieu of one 
possessing uncertainties. Nonlinear smoother procedures can be designed 
similarly. The use of fixed-lag and Rauch-Tung-Striebel smoothers may be 
preferable from a complexity perspective. However, the approximate minimum- 
variance and robust smoothers, which are presented in Section 10.5, revert to 
optimal solutions when the nonlinearities and uncertainties diminish. Another way 
of guaranteeing stability is to by imposing constraints and one such approach is 
discussed in Section 10.6. 


10.2 Extended Kalman Filtering 
10.2.1 Taylor Series Expansion 


A nonlinear function a,(x):R"—R_ having n continuous derivatives may be 
expanded as a Taylor series about a point xo 


a, (x) =a, (4) +85)" Va) 
FE ORH 2) VV, (HIG) 


bi Gr— a WV(E— a Vay (84) 
) (1) 


+E r= 45) VIVE 2,)VO— a Vay (X) (xX-X_) +... 


where Va, = Gos vee es is known as the gradient of a;(.) and 
Ox, ox, | 

[ Oa, Oa, 0a, | 
Ox; = Ax, Ox, Ox,0x,, 
0a, Oa, _ Oa, 
V'Va, =| Ox,0x, Ox; Ox, 0X, 
0a, Oa, = 0a, 

| Ox,Ox, Ox, OX, Ox; | 


is called a Hessian matrix. 


“In the real world, nothing happens at the right place at the right time. It is the job of journalists and 
historians to correct that.” Samuel Langhorne Clemens aka. Mark Twain 
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10.2.2 Nonlinear Signal Models 


Consider nonlinear systems having state-space representations of the form 


Xpur = 4, (%,) +B, (%,)y (2) 
Vy, =X) (3) 


where ax(.), bi(.) and ci.) are continuous differentiable functions. For a scalar 
function, a,(x): RR, its Taylor series about x = xo may be written as 


a, (x) =a, (x,)+(x poy Perce pom 
‘ won °” ex Le 9” Ax? Wee 
1 : Ora, 1 ,0 4 
tee) ax? |. + ea) Bat |. (4) 


Similarly, Taylor series for b,(x): RR and c,(x):R— R about x = xo are 


Ob 1 ab 
bX) = By 2) + XW) ee 5%) SS he 
1 30, 1 é"b, 
+—(x- + —Xx,)" ' (5) 
rac Xo) ax . iO 0) ax" bs 
and 
Oc, 50°C, 
Cy (X) = €, (Xp) + (X— Xp) 3 +5(*-%) aa 
” x=Xq ox X=X 
1 00 0"C, 
reas _ = n 6 
+2 Xo) re te + ie ) ee a (6) 
respectively. 


10.2.3 First-Order Extended Kalman Filter 


Suppose that filtered estimates x,,, of x; are desired given observations 


Z, =C,(%)t+y, » (7) 


“You will always define events in a manner which will validate your agreement with reality.” Steve 
Anthony Maraboli 
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where vw is a measurement noise sequence. A first-order EKF for the above 
problem is developed below. Following the approach within [3], the nonlinear 
system (2) — (3) is approximated by 


Xp = A,X, + Bw, + My» (8) 
V, =O, xX, +H, 5 (9) 


where Ax, Br, Cx, Ly and zx are found from suitable truncations of the Taylor series 
for each nonlinearity. From Chapter 4, a filter for the above model is given by 


Si = Tana th (% -O dna —%)> (10) 


Kase = AXeig + He » (11) 


where Le = P,,,_,C;Q;' is the filter gain, in which Q, = C,P,,,C, + R, Py = 
Fig Fp Cena. © Ry Ghana Bug - AR A BOB, iM 
is common practice (see [1] — [5]) to linearise about the current conditional mean 


estimate, retain up to first order terms within the corresponding Taylor series and 
assume B,= b,(X,,,) . This leads to 


Ay (X) © a, (Xp) + (HK - hip) VG, | 


x= Fup 


= A,X, + Ly (12) 


and 


Cy (%) & Ke +O Sep y Vc, 


=C,x,+2,5 (13) 


where 4; = Va, (x) » Mk = a(%,) — A X,, and mm = 


- Ck = Ve, (x) 


X=Xzq, XEN ke 


C, (X41) — C,X,,4-,- Substituting for zm, and z, into (10) — (11) gives 


Ki = Xp th (A -& Gia) (14) 
Kean = & Sri) + (15) 
Note that nonlinearities enter into the state correction (14) and prediction (15), 


whereas linearised matrices A,, B, and C; are employed in the Riccati equation and 
gain calculations. 


“People take the longest possible paths, digress to numerous dead ends, and make all kinds of 
mistakes. Then historians come along and write summaries of this messy, nonlinear process and make 
it appear like a simple straight line.” Dean L. Kamen 
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: mys Be Oa 
In the case of scalar states, the linearisations are A, = i and Ck = 
me x= Ki 
oc, : . . ; 
He . In texts on optimal filtering, the recursions (14) — (15) are either called 
x}. 
XEN kL 


a first-order EKF or simply an EKF, see [1] — [5]. Two higher order versions are 
developed below. 


10.2.4 Second-Order Extended Kalman Filter 


Truncating the series (1) after the second-order term and observing that 
(x-%,,,)'V" isa scalar yields 


7 i 1 z z 
a, (x) # a, Gin) + @— San) Vee - + oy (*=%,4) VV, eae (x-Xyi_)5 
=a,(x -,,)'V : VP..V" 
= Ay (Xp JAXX) VA Ee o Peet. | 8, 
=A,x, + LL, (16) 
‘ - 1 
where Ax = Va,(x)|,_, and pe = a (Xp) — Ad, + 3 PnV Le 
Similarly for the system output, 
C(t) =O Ken) +O — Fina) Vee exam 
1 ‘ ‘ 
as Sina) Vi Ve, sae (% — Xin) 
‘ . 1 
= 6, Reins +Oe —Fena) VO ae + 5 Pana" ee 
=C,x,+2,5 (17) 
: r 1 
where Cy = Ve,(x)|,_,. and me = ¢, (Sen) — CX ena + 3 PnaV' Cs Lees 


Substituting for 4% and z; into the filtering and prediction recursions (10) — (11) 
yields the second-order EKF 


“Tt might be a good idea if the various countries of the world would occasionally swap history books, 
just to see what other people are doing with the same set of facts.” William E. Vaughan 
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A s : 1 Z 
Xie = Xena tL, (« —C, (X44) my Vea C, 


x= Khe : (18) 


(19) 


P A 1 
Keak = A Xen) + 3 Pin Y 


X=K yp 


The above form is described in [2]. The further simplifications VP,,,V"a, 


X=Xp/K 


(VR Va and VP, ,V"¢,| 


X= Xie X=Xk iA 


= (VR Ve| are assumed 


X= py 


in [4], [5]. 


10.2.5 Third-Order Extended Kalman Filter 


Higher order EKFs can be realised just as elegantly as its predecessors. Retaining 
up to third-order terms within (1) results in 


G 7 1 
Ay (X) © Ay (Xp ip + (KX )V Ay : +—VP,,V' a, 


2 


bcm 2/ X=K yp 


1 i 
+ EVE ano ihe (X, — Xe )V Oy iy 
= Ax, + Ly (20) 
where 
1 
A, = Va,(x)|_. + EVRY a . (21) 
, ‘ 1 fs 
and we = a,(X%,,) — AX, + yn is . Similarly, for the output 
nonlinearity it is assumed that 
, F 1 Z 
CL) & Ke a+ OX )VO, en 3 ia Cy aes 
1 ‘ 
+ EVE iV , (% — Xun) VG ete 
=C,x,+2,5 (22) 
where 
1 
C, = Ve, (x) + EVE iV" = (23) 


“Following the light of the sun, we left the old world.” Christopher Columbus 


G. A. Einicke, Smoothing, Filtering and Prediction: Estimating 299 
the Past, Present and Future (24 ed.), Prime Publishing, 2019 


‘ P 1 . : 
and me = ¢, (X44) — OXpy + PaaS . . The resulting third-order 


XENIA 


EKF is defined by (18) — (19) in which the gain is now calculated using (21) and 
(23). 


Example 1. Consider a linear state evolution x,+1 = Axx + we, with A = 0.5, we € 
R, QO =0.05, a nonlinear output mapping yx = sin(x,) and noisy observations Zz; = 
yet vi, ve € R. The first-order EKF for this problem is given by 


Ken = Fy th (% — sin X41) > 


Xpsvk = AXgiy 


where Ly = Fist, Oe Q, = CPG, + Ry, Ce= COSK 1) Poe = Pra — 
Py fere Pay ts Re CPi and Peay = ALP AL + Q,. The filtering 
step within the second-order EKF is amended to 


See = Sina th (% — sings) + Si Ky) Fa /2)- 


The modified output linearisation for the third-order EKF is 


C, = cos(%, 4.4.) +8in( hy) Pega /6- 


Simulations were conducted in which the signal-to-noise-ratio was varied from 20 
dB to 40 dB for NV = 200,000 realisations of Gaussian noise sequences. The mean- 
square-errors exhibited by the first, second and third-order EKFs are plotted in 
Fig. 1. The figure demonstrates that including higher-order Taylor series terms 
within the filter can provide small performance improvements but the benefit 
diminishes with increasing measurement noise. 


MSE, dB 


20 25 30 35 40 
SNR, dB 


Fig. 1. Mean-square-error (MSE) versus signal-to-noise-ratio (SNR) for Example 1: first-order EKF 
(solid line), second-order EKF (dashed line) and third-order EKF (dotted-crossed line). 


“No two people see the external world in exactly the same way. To every separate person a thing is 
what he thinks it is — in other words, not a thing, but a think.” Penelope Fitzgerald 
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10.3. The Faux Algebraic Riccati Equation Technique 


10.3.1 A Nonlinear Observer 


The previously-described Extended-Kalman filters arise by linearising the signal 
model about the current state estimate and using the linear Kalman filter to predict 
the next estimate. This attempts to produce a locally optimal filter, however, it is 
not necessarily stable because the solutions of the underlying Riccati equations are 
not guaranteed to be positive definite. The faux algebraic Riccati technique [6] — 
[10] seeks to improve on EKF performance by trading off approximate optimality 
for stability. The familiar structure of the EKF is retained but stability is achieved 
by selecting a positive definite solution to a faux Riccati equation for the gain 
design. 


Assume that data is generated by the following signal model comprising a stable, 
linear state evolution together with a nonlinear output mapping 


X,4, = Ax, + Bw, , (24) 

Z, =C,(%,)+%, (25) 

where the components of c(.) are assumed to be continuous differentiable 

functions. Suppose that it is desired to calculate estimates of the states from the 
measurements. A nonlinear observer may be constructed having the form 

Kgsin = AX, + 8, (2, —COena))> (26) 

where g,(.) is a gain function to be designed. From (24) — (26), the state prediction 


error is given by 


Kise = Agi — Se (E)tMy» (27) 


where X, =xXx4- X,,,_, and e¢=ze— c(X,,,_,). The Taylor series expansion of cx(.) 
. The 


X= XA 


to first order terms leads to e ~ C,X,,,_, + vs, where Cy = Ve, (x) 


objective here is to design gy(ex) to be a linear function of x,,,_, to first order 


terms. It will be shown that for certain classes of problems, this objective can be 
achieved by a suitable choice of a linear bounded matrix function of the states D;, 
resulting in the time-varying gain function gi(ex) = KxDyex, where K; is a gain 
matrix of appropriate dimension. For example, consider x, € R” and z € R”, 
which yield ¢ € R” and C; ¢ R”” . Suppose that a linearisation D, ¢ R?’*” can 
be found so that C, = DiC. € R”*" possesses approximately constant terms. 


Then the locally linearised error (27) may be written as 


“The observer, when he seems to himself to be observing a stone, is really, if physics is to be believed, 
observing the effects of the stone upon himself.” Bertrand Arthur William Russell 
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Kea = (A-K,C, Xp ipa —K,Dyv, + Wy - (28) 


If 
the asymptotic stability of (28) can be guaranteed by selecting the gain such that 
[4 (A-K,C, ) <1.A method for selecting the gain is described below. 


A,(A)| <1,i=1... n, and if the pair (4,C,) is completely observable, then 


10.3.2 Gain Selection 


From (28), an approximate equation for the error covariance Pwii= E{%, 4X, } 
is 


Post = (A-K,C, Pix a(A -K,C, ‘i a K,D,RD, Ky + Q ? (29) 


which can be written as 


Pa = Pea =p Ge (CG. + DRD, YOR ies ’ (30) 
Prag = APA. (1) 


In an EKF for the above problem, the gain is obtained by solving the above 
Riccati difference equation and calculating 


K,= Pip @ (Cah ga, +D,RD;)"' . (32) 


The faux algebraic Riccati equation approach [6] — [10] is motivated by 
connections between Riccati difference equation and algebraic Riccati equation 
solutions. Indeed, it is noted for some nonlinear problems that the gains can 
converge to a steady-state matrix [3]. This technique is also known as ‘covariance 
setting’. Following the approach of [10], the Riccati difference equation (30) may 
be masqueraded by the faux algebraic Riccati equation 


D2, <2, C (Cy +DiRD.y C2. (33) 


That is, rather than solve (30), an arbitrary positive definite solution X; is assumed 
instead and then the gain at each time k is calculated from (31) — (32) using Xx in 
place of Pru-t. 


10.3.3 Tracking Multiple Signals 


Consider the problem of tracking two frequency or phase modulated signals which 


may be modelled by equation (34), where al”, ay”, a, a, 6, 6, uw, 


“The universe as we know it is a joint product of the observer and the observed.” Pierre Teilhard De 
Chardin 
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BH, WO, uw eR and wi”, ... wi? © R are zero-mean, uncorrelated, white 
processes with covariance Q = diag( 0°, ,...,0 40 ). The states a;), @{? and ¢;”, 


i = 1, 2, represent the signals’ instantaneous amplitude, frequency and phase 
components, respectively. 


a) [u? 0 0 0 0 Off ay? | | wi? 
aly) | 0 as? 0 0 0 Off ak?) fam 
Mf fo 1 1 0 0 of | |wp (34) 
a) | 0 0 0 wy 0 Olam) |e 
a” 0 0 0 0 Bg Of wo] | w 
‘cg Ie an Te oe we? | 
Let 
Zz) 1 | a® cosd | |v 
ze |_| at?sin gg? | |v? as 
Zo lay” CSO ||) |e 
2 | | a sing? | |v 


denote the complex baseband observations, where v\”, ..., v,) © R are zero- 


a . . . 2 2 
mean, uncorrelated, white processes with covariance R = diag(O,,.... 1s )- 


Expanding the prediction error to linear terms yields C, = [C,” C{”], where 


A(i) ni) ate Ri) 
COSMO Agi, SIN Pj, 


i) _ 
ie singer: 0 BO ocaeg® 
ik Agi kik 
() 
This form suggests the choice D, =| _{,, |, where 
k 
Ali) wa Ri) 
COS Qejp_1 SIN Pj, 


@ _ 
an gin dO” f ZO 6, 14 

SIN Pe! Ui COS Pepa! Gere 
In the multiple signal case, the linearisation C, = DyCy does not result in perfect 
0 


Sty 1 
decoupling. While the diagonal blocks reduce to C{"” = i 


0 
i , the off 


diagonal blocks are 


“If you haven't found something strange during the day, it hasn't been much of a day.” John Archibald 
Wheeler 
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Ai aj AA ane Ati) ALi) 
7 cos(g\)_, — gp) | 0 Ai SUPE» — Pein 
Ce ! Ri) BL) a Ri) _ BG 
SH O°S(Peii-1 ~Kirr) 9 ACH) cos( ej — Pika 
k/k-l Vik 


Assuming a symmetric positive definite solution to (33) of the form X, = 
Se O° 0] 
0 xe xD’ |, with D¢, 2°, De’, Le © R and choosing the gains according 
OE at 


Ki 0 
to (32) yields Ke= | 0 Kf |, where Ké = X¢(5% + o?)', KO = TO cae + 
0 Kf 


o-a@-,)' and K¢ = X%(2% + o24,7,)'. The nonlinear observer then becomes 


Ali) 


pe sin ky + oe ? 


Ai) _ Ai) a 

Bijy = Ay +X, (Zz COSP, 4 + Z; 
(1) es A) og ¢ (1) Fi) (2) gi Ali) x (i) @ 2 (i) = 
Op, = Opp F Uy" (Zp COSHH, +2 SING. Ae aZe FO, (Ana) > 


Ri) _ Ai) o(-() Ri) (2) gin AC) A (i) @ 2 (i) =i 
kik = Print Uy (Zp COSP 4 +2," SIN Pe, MA are +O, Ana) - 


10.3.4 Stability Conditions 


In order to establish conditions for the error system (28) to be asymptotically 


Wy 
(2) 
stable, the problem is recast in a passivity framework as follows. Letw=]| ‘* |, 
ww” 
BD 
eo) 
e=|* € R”. Consider the configuration of Fig. 2, in which there is a 
en 


cascade of a stable linear system & and a nonlinear function matrix y(.) acting on 
e. It follows from the figure that 


e=w- Gre). (36) 


“Discovery consists of seeing what everyone has seen and thinking what nobody has thought.” Albert 
Szent-Gorgyi 
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Let V denote a forward difference operator with Ve, = et”? — ef”. It is assumed 


that ).) satisfies some sector conditions which may be interpreted as bounds 
existing on the slope of the components of .); see Theorem 14, p. 7 of [11]. 


Fig. 2. Nonlinear error system configuration. Fig. 3. Stable gain space for Example 2. 


Lemma 1 [10]: Consider the system (36), where w, e € R”. Suppose that .) 
consists of m identical, noninteracting nonlinearities, with y(e) monotonically 
increasing in the sector [0,8], B= 0, B € R, that is, 

0< 7(e?)/e? <B (37) 


forall &? ER, e? #0. Assume that G is a causal, stable, finite-gain, time- 


invariant map IR” — R”, having a z-transform G(z), which is bounded on the 
unit circle. Let I denote an mxm identity matrix. Suppose that for some q > 0, q 
€ R, there exists a 6 € R, such that 


((G(z)+qVG(z)+ IP ')e,e) > 5(e,e) (38) 
for all é © R. Under these conditions w € ¢, implies e, y(e\?) € ¢,. 


Proof: From (36), Vw = Ve + VG(z)y(e) and w + q Vw = (G(z) + gq VG(z) + _ IB) 
y(e) + e-IB'y(e)+ e-IB'y(e) + qVe. Then 


(w+qVw,y(e)) > (e-IB 'y(e),v(e)) + (qVe),7(e)) 
+((G(z)+qVG(z)+1B")y(e),7(e)). (39) 


“The intelligent man finds almost everything ridiculous, the sensible man hardly anything.” Johann 
Wolfgang von Goethe 
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Consider the first term on the right hand side of (39). Since the y(e) consists of 
noninteracting nonlinearities, (y(e), e) = » (7(e), 2” ) and 
i=l 


m 


(e-IB'v(e),7()) = pe - ve )IB"',e°) > 0. Using the approach of 


i=l 
[11] together with the sector conditions on the identical noninteracting 
nonlinearities (37), it can be shown that expanding out the second term of (39) 


yields (Ve,v(e)) > 0. Using Vw, < 2m, (from p. 192 of [11]), the Schwartz 
inequality and the triangle inequality, it can be shown that 


(w+ qvw, y(e)) < (1+2q) II], . (40) 


It follows from (38) — (40) that |ly(@)|, < (1 + 2q)d™ |w 
Since the gain of G(z) is finite, it also follows that G(z)y(e?) € £,. 


5» hence v(e) € £,. 


If G(z) is stable and bounded on the unit circle, then the test condition (38) 
becomes 


A 


‘min 


(+g —z'I\(G(z)+ G4 (z))+ B26, (41) 
see pp. 175 and 194 of [11]. 


10.3.5 Applications 


Example 2 [10]. Consider a unity-amplitude frequency modulated (FM) signal 
modelled as @u1 = MoO + We, deri = de + Oe, Z, = cos(d) + vf? and z\? = 


sin(¢,) + v\”. The error system for an FM demodulator may be written as 


Oy 41 Ho lea | : a. 
eS eh sin(@, ) + w, 4? 
be | 1 1 d, K, i : ( ) 
for gains Ki, K2 € R to be designed. In view of the form (36), the above error 
system is reformatted as 


Dp 41 oC Dy i, 
Sit ABE 


where y(x) = x — sin(x). The z-transform of the linear part of (43) is G(z) = (Koz + 
Ko + poK, ) (2? + (Ko - 1 — o)z + Ki + 1 — floK2y!. The nonlinearity satisfies the 
sector condition (37) for @ = 1.22. Candidate gains may be assessed by checking 


“He that does not offend cannot be honest.” Thomas Paine 
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that G(z) is stable and the test condition (41). The stable gain space calculated for 
the case of fo = 0.9 is plotted in Fig. 3. The gains are required to lie within the 
shaded region of the plot for the error system (42) to be asymptotically stable. 


0 
5 
--] | 
ul ui -10 0) 
g g 
-15 
a i 
(i ese (i) 
-18 -20 ===! 
0 5 10 20 30 
SNR, dB SNR, dB 
Fig. 4. Demodulation performance for Example Fig. 5. Demodulation performance for Example 
2: (i) EKF and (ii) Nonlinear observer. 3: (i) EKF and (11) Nonlinear observer. 


A speech utterance, namely, the phrase “Matlab is number one”, was sampled at 8 
kHz and used to synthesize a unity-amplitude FM signal. An EKF demodulator 


was constructed for the above model with o? = 0.02. In a nonlinear observer 
0.001 0.08 
0.08 0.7 
nonlinear observer gains were censored at each time k according to the stable gain 
space of Fig. 3. The results of a simulation study using 100 realisations of 
Gaussian measurement noise sequences are shown in Fig. 4. The figure 
demonstrates that enforcing stability can be beneficial at low SNR, at the cost of 
degraded high-SNR performance. 


design it was found that suitable parameter choices were Xx = . The 


Example 3 [10]. Suppose that there are two superimposed FM signals present in 
the same frequency channel. Neglecting observation noise, a_ suitable 
approximation of the demodulator error system in the form (36) is given by 


(1) 
O41 QW, 
7 (1) 7 (1) (AW) _ 7 
=. sin —@, 
in =(A-K,C) * IK (. ) e, ; (44) 
~(2) k > (2) Va ee (2) (2) 
O41 @, sin(¢,””) — g, 
(2) (2) 
k+1 k 


0] — fo 100 
where A = diag(A®, 4%), 4W=|4 ©) @ = 
11 OOO 4 


(44) may be written as G(z) = C(zl — (A — K,C))'K,. Two 8-kHz speech 
k k 


. The linear part of 


“To avoid criticism, do nothing, say nothing, be nothing.” Elbert Hubbard 


G. A. Einicke, Smoothing, Filtering and Prediction: Estimating 307 
the Past, Present and Future (2™ ed.), Prime Publishing, 2019 


utterances, “Matlab is number one” and “Number one is Matlab”, centred at +0.25 
rad/s, were used to synthesize two superimposed unity-amplitude FM signals. 
Simulations were conducted using 100 realisations of Gaussian measurement 
noise sequences. The test condition (41) was evaluated at each time & for the 
above parameter values with 6 = 1.2, g = 0.001, 6 = 0.82 and used to censor the 
gains. The resulting co-channel demodulation performance is shown in Fig. 5. It 
can be seen that the nonlinear observer significantly outperforms the EKF at high 
SNR. 


Two mechanisms have been observed for occurrence of outliers or faults within 
the co-channel demodulators. Firstly errors can occur in the state attribution, that 
is, there is correct tracking of some component speech message segments but the 
tracks are inconsistently associated with the individual signals. This is illustrated 
by the example frequency estimate tracks shown in Figs. 6 and 7. The solid and 
dashed lines in the figures indicate two sample co-channel frequency tracks. 
Secondly, the phase unwrapping can be erroneous so that the frequency tracks 
bear no resemblance to the underlying messages. These faults can occur without 
any significant deterioration in the error residual. 


= 


2.7 2.8 2.9 3 2.7 2.8 2.9 3 
time, ms time, ms 


Fig. 6. Sample EKF frequency tracks for Example Fig. 7. Sample Nonlinear observer frequency 
3: tracks for Example 3. 


The EKF demodulator is observed to be increasingly fault prone at higher SNR. 
This arises because lower SNR designs possess narrower bandwidths and so are 
less sensitive to nearby frequency components. The figures also illustrate the 
trade-off between stability and optimality. In particular, it can be seen from Fig. 6, 
that the sample EKF speech estimates exhibit faults in the state attribution. This 
contrasts with Fig. 7, where the nonlinear observer’s estimates exhibit stable state 
attribution at the cost of degraded speech fidelity. 


“You have enemies? Good. That means you’ve stood up for something, sometime in your life.” 
Winston Churchill 
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10.4 Robust Extended Kalman Filtering 


10.4.1 Nonlinear Problem Statement 


Consider again the nonlinear, discrete-time signal model (2), (7). It is shown 
below that the H.. techniques of Chapter 9 can be used to recast nonlinear filtering 
problems into a model uncertainty setting. The following discussion attends to 
state estimation, that is, Ci. = J is assumed within the problem and solution 
presented in Section 9.3.2. 


The Taylor series expansions of the nonlinear functions ax(.), be.) and ci{.) about 
filtered and predicted estimates x,,, and x,,,_, may be written as 


Ay (Xp) = Oy (Kyi + VO (Kyi Me Ki) + Ar Opi) » (45) 
By, (%,) =O, Kid + Ay Keg) » (46) 
Cy (Xp) = Cy Kea) + Vy Kee ~ Xena) t As ad » (47) 


where A,(.), A,(.), A,(.) are uncertainties that account for the higher order 
terms, X,,,=2x*— X,,, and X,,,,=x*— X,,,_,. It is assumed that A,(.), A,(.) and 
A,(.) are continuous operators mapping /, —> ¢,, with H. norms bounded by 61, 
62 and 03, respectively. 


Substituting (45) — (47) into the nonlinear system (2), (7) gives the linearised 
system 


Xp = AX, + Bey + My $A) + ALK) (48) 
Zz, =C,.x, +0, +A, (Ey. ,)+%» (49) 


where A; = Va, (x) 


Ck = Ve, (x) 


» Mk = Ay (Xpi¢) — App, and me = 


=< > — 2 
XEN X= Mk 


C(Kena) — Cee - 


Note that the first-order EKF for the above system arises by setting the 
uncertainties A,(.), A,(.) and A,(.) to zero as 


Kee = Xa +h (& —%% Gina)» (50) 

ese = % ig) » (51) 

L,= PraCe (C.PinaCe +R)" > (52) 

Pur = Pores = Pat, (CPG, +R) Pra, ’ (53) 
Pesan = AP Ae +B,O,B; . (54) 


“Fight the good fight.” Timothy 4:7 
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10.4.2 Robust Solution 


Following the approach in Section 9.2.4, instead of addressing the problem (48) — 
(49) which possesses uncertainties, an auxiliary H. problem is defined as 


Xp = A,X, + Bw, + My +5; 5 (55) 
Z,=C,xX, +72, +v, +t, (56) 
Kary =X —Syiys (57) 


where sx = A,(X,,,) + A,(%,,)w, and t, = A,X,,, ~ A,X,,, are additional 
exogenous inputs satisfying 

2 aW~ 42 2 2 

Isls SO Bel + Oil» (58) 


lee < 8 Weve <8 Wall. (59) 


A sufficient solution to the auxiliary H. problem (55) — (57) can be obtained by 
solving another problem in which w; and v; are scaled in lieu of the additional 
inputs s; and rz. The scaled H. problem is defined by 


Xp = AX, + BC, Wy + My» (60) 
Z, =C,X, +¢,V, +2, 5 (61) 
Kei = Xp — Kags (62) 


where c,,, cy € R are to be found. 
Lemma 2 [12]: The solution of the Hx problem (60) — (62), where vx is scaled by 
c =1-7°6, -6,, (63) 
and wy; is scaled by 
ce =c(1t+d,)', (64) 
is sufficient for the solution of the auxiliary Hx problem (55) — (57). 


Proof: If the H» problem (50) — (52) has been solved then there exists a y #0 
such that 


Lecalh <7? Cath + Mselb + belb + Dale) 


<7" (lm [ + 5 [Ker I: +5) |, I: + 53 [lier I: + [ly '®) , 


which implies 


“You can’t wait for inspiration. You have to go after it with a club.” Jack London 
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(1-75) — 9°53) lei if <7°((1+5))])y, I: +|\v, DD 
and 


2 
Pel) - 


The robust first-order extended Kalman filter for state estimation is given by (50) 
I (5 2), 


-1 
Pi -yl Pee I 

Pon = Fea — Fina [7 cr] ie eee T C Pox 
CFs Re +t OFS k 


2 -2 
Peal, +e 


~ 12 27-2 
Fare [ <y'(c, 


and (54). As discussed in Chapter 9, a search is required for a minimum y such 

Ba gayol see oe 
Cina R, 25 CoPisce 

illustration is provided below. 


that | > 0 and Pri > 0 over k © [1, N]. An 


bh 
o 


(i) 


wo 
So 


Cummulative Frequency 
=) 8 


-20 -10 0 10 
MSE, dB 


Fig. 8. Histogram of demodulator mean-square-error for Example 4: (i) first-order EKF (solid line) and 
first-order robust EKF (dotted line). 


Example 4 [12]. Suppose that an FM signal is generated by 


Oy.) = My, + Wy » (65) 
9.1 = arctan( 11,9, +@,), (66) 
z =cos(¢,)+v\”, (67) 
2 =sin(g,)+v2. 68) 


The objective is to construct an FM demodulator that produces estimates of the 


frequency message cw, from the noisy in-phase and quadrature measurements z\” 


and z\”, respectively. Simulations were conducted with y.. = 0.9, ug = 0.99 and 


“Most pioneers are at the mercy of doubt at the beginning, whether of their worth, of their theories, or 
of the whole enigmatic field in which they labour.” Johann Wolfgang von Goethe 
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Oy = O., = 0.001. It was found for o, < 0.1, where the state behaviour is 


almost linear, a robust EKF does not improve on the EKF. However, when o. = 


1, the problem is substantially nonlinear and a performance benefit can be 
observed. A robust EKF demodulator was designed with 


‘ My a 
Xk = i A= (Pri + Ox.) +1 (LP it + Ou), ye +1], 
a 
0 LM, 
_|-sinn1) 0 
cos(g,,,) 9 


0; = 0.1, 62 = 4.5 and 63 = 0.001. It was found that y = 1.38 was sufficient for Pyyx-1 
of the above Riccati difference equation to always be positive definite. A 
histogram of the observed frequency estimation error is shown in Fig. 8, which 
demonstrates that the robust demodulator provides improved mean-square-error 


performance. For sufficiently large o7 


w? 


the output of the above model will 
resemble a digital signal, in which case a detector may outperform a demodulator. 


10.5 Nonlinear Smoothing 


10.5.1 Approximate Minimum-Variance Smoother 


Consider again a nonlinear estimation problem where xg+1 = ax(xx) + Baws, Ze = 
cexn) + ve, with xx € R, in which the nonlinearities ai(.), cx.) are assumed to be 
smooth, differentiable functions of appropriate dimension. The linearisations akin 
to Extended Kalman filtering may be applied within the smoothers described in 
Chapter 7 in the pursuit of performance improvement. The fixed-lag, Fraser-Potter 
and Rauch-Tung-Striebel smoother recursions are easier to apply as they are less 
complex. The application of the minimum-variance smoother can yield 
approximately optimal estimates when the problem becomes linear, provided that 
the underlying assumptions are correct. 


Procedure I. An approximate minimum-variance smoother for output estimation 
can be implemented via the following three-step procedure. 
Step 1. Operate 


a, = -9;? (2, -—&, X41)» (69) 
Sin = Fina th & -% Gna) (70) 
Kasse = % (Key) » (71) 


“You can recognize a pioneer by the arrows in his back” Beverly Rubik 
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TQ-! 
on the measurement z;, where Ly = P.,,,C,Q, , 


Q, = CEA, + R,, 


Die na = Pipi, OC se (72) 
Pie APA, i BOB, > 
fe) fa) 
Ax = a and C,= Oe 
Ox x=X, Ox x=Xp/p 


Step 2. Operate (69) — (71) on the time-reversed transpose of ox. Then take the 
time-reversed transpose of the result to obtain fx. 
Step 3. Calculate the smoothed output estimate from 


Yuin =% — RB, - (73) 


10.5.2 Robust Smoother 


From the arguments within Chapter 9, a smoother that is robust to uncertain wx 
and v, can be realised by replacing the error covariance correction (72) by 


GPaei wt SOR ee Vic 
ap = Peg — Pays kes “al gms if erie T ‘ k/k-1 
Cina’ R, + OP, C, 
within Procedure 1. As discussed in Chapter 9, a search for a minimum y such that 
C,P,,C, -y1 CPC : ‘ 
aie as ie A MENS | > Oand Pui > 0 over k € [1, N] is desired. 
CPi aG R, +O, Pry i, 


10.5.3 Application 


Returning to the problem of demodulating a unity-amplitude FM signal, let x, = 
Q, , 0 

| i ee = 7S [1 0] > ZE = cos(¢, ) + ve > zy = sin(g, ) aa > 
p, 1 My 


where w:, dk, Ze and vz denote the instantaneous frequency message, instantaneous 
phase, complex observations and measurement noise respectively. A zero-mean 
voiced speech utterance “a e io u” was sampled at 8 kHz, for which estimates 1, 
= 0.97 and G6 = 0.053 were obtained using an expectation maximisation 
algorithm. An FM discriminator output [13], 


“The farther the experiment is from theory, the closer it is to the Nobel Prize.” Iréne Joliot-Curie, 
winner of the 1935 Nobel Prize in Chemistry 
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ee a “1 
2p =[29 Sp Se any sean)’, (74) 
dt dt 
serves as a benchmark and as an auxiliary frequency measurement for the above 
ZO! || coset”) 
smoother. The innovations within Steps 1 and 2 are given by | z\” |-| sin(<\”) 
a Oa 
a” | | cos(X”) 
and | a) |—| sin(<\) | respectively. A  unity-amplitude FM signal was 
ae x 


synthesized using zg = 0.99 and the SNR was varied in 1.5 dB steps from 3 dB to 
15 dB. The mean-square errors were calculated over 200 realisations of Gaussian 
measurement noise and are shown in Fig. 9. It can be seen from the figure, that at 
7.5 dB SNR, the first-order EKF improves on the FM discriminator MSE by about 
12 dB. The improvement arises because the EKF demodulator exploits the signal 
model whereas the FM discriminator does not. The figure shows that the 
approximate minimum-variance smoother further reduces the MSE by about 2 dB, 
which illustrates the advantage of exploiting all the data in the time interval. In the 
robust designs, searches for minimum values of y were conducted such that the 
corresponding Riccati difference equation solutions were positive definite over 
each noise realisation. It can be seen from the figure at 7.5 dB SNR that the robust 
EKF provides about a | dB performance improvement compared to the EKF, 
whereas the approximate minimum-variance smoother and the robust smoother 
performance are indistinguishable. 


MSE, dB 
3 
x 


(ii), (iil) 


N 


4 6 8 10 12 14 
SNR, dB 


Fig. 9. FM demodulation performance comparison: (i) FM discriminator (crosses), (ii) first-order EKF 
(dotted line), (iii) Robust EKF (dashed line), (iv) approximate minimum-variance smoother and robust 
smoother (solid line). 


“They thought I was crazy, absolutely mad.” Barbara McClintock, winner of the 1983 Nobel Prize in 
Physiology or Medicine 
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This nonlinear example illustrates once again that smoothers can outperform 
filters. Since a first-order speech model is used and the Taylor series are truncated 
after the first-order terms, some model uncertainty is present, and so the robust 
designs demonstrate a marginal improvement over the EKF. 


10.6 Constrained Filtering and Smoothing 


10.6.1 Background 


Constraints often appear within navigation problems. For example, vehicle 
trajectories are typically constrained by road, tunnel and bridge boundaries. 
Similarly, indoor pedestrian trajectories are constrained by walls and doors. 
However, as constraints are not easily described within state-space frameworks, 
many techniques for constrained filtering and smoothing are reported in the 
literature. An early technique for constrained filtering involves augmenting the 
measurement vector with perfect observations [14]. The application of the perfect- 
measurement approach to filtering and fixed-interval smoothing is described in 
[15]. 


Constraints can be applied to state estimates, see [16], where a positivity 
constraint is used within a Kalman filter and a fixed-lag smoother. Three different 
state equality constraint approaches, namely, maximum-probability, mean-square 
and projection methods are described in [17]. Under prescribed conditions, the 
perfect-measurement and projection approaches are equivalent [5], [18], which is 
identical to applying linear constraints within a form of recursive least squares. 


In the state equality constrained methods [5], [16] — [18], a constrained estimate 
can be calculated from a Kalman filter’s unconstrained estimate at each time step. 
Constraint information could also be embedded within nonlinear models for use 
with EKFs. A simpler, low-computation-cost technique that avoids EKF stablity 
problems and suits real-time implementation is described in [19]. In particular, an 
on-line procedure is proposed that involves using nonlinear functions to censor the 
measurements and subsequently applying the minimum-variance filter recursions. 
An off-line procedure for retrospective analyses is also described, where the 
minimum-variance fixed-interval smoother recursions are applied to the censored 
measurements. In contrast to the afore-mentioned techniques, which employ 
constraint matrices and vectors, here constraint information is represented by an 
exogenous input process. This approach uses the Bounded Real Lemma which 
enables the nonlinearities to be designed so that the filtered and smoothed 
estimates satisfy a performance criterion. 


“Tf at first, the idea is not absurd, then there is no hope for it.” Albert Einstein 
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10.3.2 Problem Statement 


The ensuing discussion concerns odd and even functions which are defined as 
follows. A function g, of X is said to be odd if g.(— X) = — g.(X). A function f, of X 
is said to be even if f(— X) = f-(X). The product of g, and f- is an odd function 
since go(— X) fe(— X) = — go(X) fX). 


Problems are considered where stochastic random variables are subjected to 
inequality constraints. Therefore, nonlinear censoring functions are introduced 


whose outputs are constrained to lie within prescribed bounds. Let 6 € R’ and 
g,:R’ — R?’ denote a constraint vector and an odd function of a random 


variable X e€ R’ about its expected value E{X}, respectively. Define the 
censoring function 


g(X) = E{X}+ g,(X,£), (75) 
where 
B if B< X-E{X} 
g(X, B=) X -E{X} if —B<X-E{X}<f. (76) 
-£ if X -E{X}<-f 


By inspection of (75) — (76), g(X) is constrained within E{X} + 6. Suppose that the 
probability density function of X about E{X} is even, that is, is symmetric about 
E{X}. Under these conditions, the expected value of g(X) is given by 


Etg(X)}=[- gO f.ade 
= EX} £.dde+ |" 2,6 BL Ode 
= F(X}. a 


since [ F.(x)dx = 1 and the product g,(x, 8) f,(x) is odd. 


Thus, a constraining process can be modelled by a nonlinear function. Equation 
(77) states that g(X) is unbiased, provided that g,(X,f) and /x(X) are odd and even 
functions about E{X}, respectively. In the analysis and examples that follow, 
attention is confined to systems having zero-mean inputs, states and outputs, in 
which case the censoring functions are also centred on zero, that is, E{X} =0. 


“Tt was not easy for a person brought up in the ways of classical thermodynamics to come around to the 
idea that gain of entropy eventually is nothing more nor less than loss of information.” Gilbert Newton 
Lewis 
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T : 7 . 
Let we = [4 ay Wr | e€ R” represent a stochastic white input process 
having an even probability density function, with E{w,}=0, E{w We }=0,6,5 
in which 6, denotes the Kronecker delta function. Suppose that the states of a 


system G: RR” — R? are realised by 


Xp41 = AX, + BLY, , (78) 


where 4, € R”” and Bk €¢ R””. Since wz is zero-mean, it follows that linear 
combinations of the states are also zero-mean. Suppose also that the system 
outputs, yz, are generated by 


Vik &o (C, X54) 
y=! i f= : , (79) 
Y pk 8o (Ch Xe29o 4) 


where Cj, is the j” row of Ce ¢ R””, & = [A, ... 9,,]' € R” is an input 
constraint process and g,(C,,%,,9,,),j = 1, ... p, is an odd censoring function 


centred on zero. The outputs yx are constrained to lie within +6, , , that is, 


eS esre (80) 


For example, if the system outputs represent the trajectories of pedestrians within 
a building then the constraint process could include knowledge about wall, floor 
and ceiling positions. Similarly, a vehicle trajectory constraint process could 
include information about building and road boundaries. 

Assume that observations z; = yx + v; are available, where v; € R” is a stochastic, 
white measurement noise process having an even probability density function, 


with E{v,}=0, E{v,}=0, Ey} =R,6,, and Eiw,y,} =0. It is convenient 
to define the stacked vectors y = [y/ ... y\,]' and@ = [6" ... Of)’. It follows 
that 


Ivh <l6lh- en) 


Thus, the energy of the system’s output is bounded from above by the energy of 
the constraint process. 


The minimum-variance filter and smoother which produce estimates of a linear 
system’s output, minimise the mean square error. Here, it is desired to calculate 
estimates that trade off minimum mean-square-error performance and achieve 


“Man's greatest asset is the unsettled mind.” Isaac Asimov 
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> <lol;- me) 


5 
Note that (80) implies (81) but the converse is not true. Although estimates 
Yi, Of y,, satisfying -0,,<),,<0,, are desirable, the procedures 
described below only ensure that (82) is satisfied. 


10.3.3 Constrained Filtering 


A procedure is proposed in which a linear filter 7#:R’ — R” is used to calculate 
estimates » from zero-mean measurements z; that are constrained using an odd 
censoring function to obtain 


Auk B(ZioY 4) 


Ze=| is = : ; (83) 
Zn k 80 (Carey ae Ae 
which satisfy 
2 9 2 4 
lz <7*|AL.- a 
where z = [z) ... z,]', for a positive y € R to be designed. This design 


problem is depicted in Fig. 10. 


Wie Vik Ak Mik 


Wink MV pike 


Fig. 10. The constrained filtering design problem. The task is to design a scalar y so that the outputs of 


: T T 4 
a filter 71 operating on the censored zero-mean measurements [Zi dda) af | produce output 
, aT AT 47 : : 
estimates [V,, ... y 5 ,] . which trade off mean square error performance and achieve 
ali2 2 
|| 
5l, <A, 


Censoring the measurements is suggested as a low-implementation-cost approach 
to constrained filtering. Design constraints are sought for the measurement 


“A mind that is stretched by a new idea can never go back to its original dimensions.” Oliver Wendell 
Holmes 
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censoring functions so that the outputs of a subsequent filter satisfy the 
performance objective (82). The recursions akin to the minimum-variance filter 
are applied to calculate predicted and filtered state estimates from the constrained 
measurements z, at time k. That is, the output mapping C; is retained within the 


linear filter design even though nonlinearities are present with (83). The predicted 
states, filtered states and output estimates are respectively obtained as 


Span = (A KG Sia + KZ » (85) 
Sin =U-EhG Fini tL Z» (86) 
Vere as Cen > (87) 


where Li= Py iC (CPRyaG + R,)'. Ke= Ade and Fy = By > Vis 
obtained from Bip = Pigg = Rat (CPG, FR CRs Bae = 
A,P.,A, + B,.Q,B/. Nonzero-mean sequences can be accommodated using 


deterministic inputs as described in Chapter 4. Since a nonlinear system output 
(79) and a nonlinear measurement (83) are assumed, the estimates calculated from 
(85) — (87) are not optimal. Some properties that are exhibited by these estimates 
are described below. 


Lemma 3 [19]: In respect of the filter (85) — (87) which operates on the 
constrained measurements (83), suppose the following: 
(i) the probability density functions associated with wz and vx are even; 
(ii) the nonlinear functions within (79) and (83) are odd; and 
(iii) the filter is initialised with X),. = E{x)}. 
Then the following applies: 


(i) the predicted state estimates, x,,,,,, are unbiased; 
(ii) the corrected state estimates, X,,,, are unbiased; and 
(iii) the output estimates, y,,,, are unbiased. 


Proof: (i) Condition (iii) implies E{X,,,} = 0, which is the initialisation step of an 
induction argument. It follows from (85) that 


Kesin = (Ay — KG) een + Ki (Cy +¥,) + Ky (] -Cyy, -%)- (88) 


Subtracting (88) from (78) gives Xj. = (Ay — KC) Xe Bow, — Kyy, 
K,(% — C,x, — v,) and therefore 


EXX pags = (A, -K, CEM ah + BEI - KEYS - KB ELZ, — Gx, — vy} (89) 


“All truth passes through three stages: First, it is ridiculed; Second, it is violently opposed; and Third, it 
is accepted as self-evident.” Arthur Schopenhauer 
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From above assumptions, the second and third terms on the right-hand-side of 
(89) are zero. The property (77) implies E{z,} = E{z,} = E{C,x,+ v,} and so 
E{z, —C,x, —v,} is zero. The first term on the right-hand-side of (89) pertains to 
the unconstrained Kalman filter and is zero by induction. Thus, E{X,,,,,} = 0. 


(ii) Condition (iii) again serves as an induction assumption. It follows from (86) 
that 


Keie = Xp +L (Cx + VM — G Xin a) + LX - Ox, — Ve) - (90) 
Substituting x, = A,,x,, + B,,w,, into (90) yields x,, = @ - 
LOJA Xanr +O - LC Bam. ~hy — (a — Gx — vy) and 
EXxXiys = U-LC)A EX anat = U-LEG)Aa - U-LO)AE Xo - 


Hence, E{x,,,} = 0 by induction. 
(iv) Defining Vx Ve Derk Ye + OY Ke) Cx, = 
CX. + VY, ~ C,.x, and using (77) leads to E{y,,,} = C.E{X,),} 
+ Ely, -C,x,} = C,E{%,,,} = 0 under condition (iii). 


(v) 


Recall that the Bounded Real Lemma (see Lemma 7 of Chapter 9) specifies a 
bound for a ratio of a system’s output and input energies. This lemma is used to 
find a design for y within (83) as described below. 


Lemma 4 [19]: Consider the filter (85) — (87) which operates on the constrained 
measurements (83). Let A, = A,—K,C,, B, = K,, C, = C,U-L,C,) and D, 
= C,L, denote the state-space parameters of the filter. Suppose for a given y2 > 0, 
that a solution M, = Mj > 0 exists over k € [1, NJ for the Riccati Difference 
equation resulting from the application of the Bounded Real Lemma to the system 


A, B 
fe al Then the design y = y2 within (83) results in the performance 
=k —k 


objective (82) being satisfied. 


Proof: For the application of the Bounded Real Lemma to the filter (85) — (87), 


the existence of a solution M, = M{ > 0 for the associated Riccati difference 
equation ensures that II) < v2 lle, — x1M x < %3 \|z i which together with 


(84) leads to (82). 


It is argued below that the proposed filtering procedure is asymptotically stable. 


“Everything we know is only some kind of approximation, because we know that we do not know all 
the laws yet. Therefore, things must be learned only to be unlearned again or, more likely, to be 
corrected.” Richard Phillips Feynman 
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Lemma 5 [19]: Define the filter output estimation error as » =y — jy. Under 


the conditions of Lemma 4, y € £,. 


Proof: It follows from } =y — y that ||, < ly], + 


S|, which together with 


(10) and the result of Lemma 4 yields lr], < 2\|0 


|, , thus the claim follows. 


10.3.4 Constrained Smoothing 


In the sequel, it is proposed that the minimum-variance fixed-interval smoother 
recursions operate on the censored measurements z, to produce output estimates 


Vey OF ye. 


Lemma 6 [19]: In respect of the minimum-variance smoother recursions that 
operate on the censored measurements z,, under the conditions of Lemma 3, the 


smoothed estimates, y,,,, are unbiased. 


The proof follows mutatis mutandis from the approach within the proofs of 
Lemma 5 of Chapter 7 and Lemma 3. An analogous result to Lemma 5 is now 
stated. 


Lemma 7 [19]: Define the smoother output estimation error as y =y - y. 


Under the conditions of Lemma 3, » € ¢,. 


The proof follows mutatis mutandis from that of Lemma 5. Two illustrative 
examples are set out below. A GPS and inertial navigation system integration 
application is detailed in [19]. 


Example 5 [19]. Consider the saturating nonlinearity 
g,(X, B) = 28 arctan (xxX(2B)') : (91) 
which is a continuous approximation of (76) that satisfies 


g,(X,A)| < |A| and 
dg (X,B) _ “| 


- (1 + (#X)(2B)*) = 1 when (7X)°(28)* << 1. Data was 


Ea hon GHROOe CU ee eee |e 
enerate rom y 5 » where = ’ = = : 
0 0.9 Ort 


0.01 0 


Gaussian, white, zero-mean processes with O = R= a Ont 


. The constraint 


“A man whose errors take ten years to correct is quite a man.” Julius Robert Oppenheimer 
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0.5 
vector within (80) was chosen to be fixed, namely, 0; = Fi i ,k €[1, 10°]. The 


limits of the observed distribution of estimates, ),,, = eal arising by 


V2,kIk 
operating the minimum-variance filter recursions on the raw data z, = yx + vx are 
indicated by the outer black region of Fig. 11. It can be seen that the filter outputs 
do not satisfy the performance objective (82), which motivates the pursuit of 
constrained techniques. A minimum value of y2 = 1.24 was found for the solutions 
of the Riccati difference equation mentioned specified within Lemma 4 to be 
positive definite. The filter (85) — (87) was applied to the censored measurements 


zZ, 16 

Zo= a = BolZiee 7 i) using (91). The limits of the observed 
2k Bo (Zops¥ a) 

distribution of the constrained filter estimates are indicated by the inner white 

region of Fig. 11. The figure shows that the constrained filter estimates satisfy 

(82), which illustrates Lemma 5. 


Example 6 [19]. Measurements were similarly synthesised using the parameters of 
Example 5 to demonstrate constrained fixed-interval smoother performance. A 
minimum value of y2 = 5.6 was found for the solutions of the Riccati difference 
equation mentioned within Lemma 4 to be positive definite. The superimposed 
distributions of the unconstrained and constrained smoothers are respectively 
indicated by the inner and outer black regions of Fig. 12. It can be seen by 
inspection of the figure that the constrained smoother estimates meets (80), 
whereas those produced by the standard smoother do not. 


0.5 0.5 


92,k/k 
= 

§2,k/N 
e 


0.5 0.5 


0.5 1 “at 0.5 0 0.5 1 


zat -0.5 0 “ 
YAk/k Yik/N 


Fig. 11. Superimposed distributions of filtered Fig. 12. Superimposed distributions of 
estimates for Example 4: unconstrained filter smoothed estimates for Example 5: unconstrained 
(outer black); and constrained filter (middle smoother (outer black); and constrained smoother 
white). (middle white). 


The above examples involved searching for minimum value of 72 for the existence 
of positive definite solutions for the Riccati equation alluded to within Lemma 4. 


“An expert is a man who has made all the mistakes which can be made in a very narrow field.” Niels 
Henrik David Bohr 
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The need for a search may not be apparent as stability is guaranteed whenever a 
positive definite solution for the associated Riccati equation exists. Searching for a 
minimum y2is advocated because the use of an excessively large value can lead to 
a nonlinearity design that is conservative and exhibits poor mean-square-error 
performance. If a design is still too conservative then an empirical value, namely, 


y= |, lz," , may need to be considered instead. 


10.6 Chapter Summary 


In this chapter it is assumed that nonlinear systems are of the form xj+1 = ax(xx) + 
bdwi), ye = cx), where a(.), be.) and cx.) are continuous differentiable 
functions. The EKF arises by linearising the model about conditional mean 
estimates and applying the standard filter recursions. The first, second and third- 
order EKFs simplified for the case of x, € IR are summarised in Table 1. 


The EKF attempts to produce locally optimal estimates. However, it is not 
necessarily stable because the solutions of the underlying Riccati equations are not 
guaranteed to be positive definite. The faux algebraic Riccati technique trades off 
approximate optimality for stability. The familiar structure of the EKF is retained 
but stability is achieved by selecting a positive definite solution to a faux Riccati 
equation for the gain design. 


H.. techniques can be used to recast nonlinear filtering applications into a model 
uncertainty problem. It is demonstrated with the aid of an example that a robust 
EKF can reduce the mean square error when the problem is sufficiently nonlinear. 


Linearised models may be applied within the previously-described smoothers in 
the pursuit of performance improvement. Nonlinear versions of the fixed-lag, 
Fraser-Potter and Rauch-Tung-Striebel smoothers are easier to implement as they 
are less complex. However, the application of the minimum-variance smoother 
can yield approximately optimal estimates when the problem becomes linear, 
provided that the underlying assumptions are correct. A smoother that is robust to 
input uncertainty is obtained by replacing the approximate error covariance 
correction with an H.. version. The resulting robust nonlinear smoother can exhibit 
performance benefits when uncertainty is present. 


In some applications, it may be possible to censor a system’s inputs, states or 
outputs, rather than proceed with an EKF design. It has been shown that the use of 
a nonlinear censoring function to constrain input measurements leads to bounded 
filter and smoother estimation errors. 


“Most of what I learned as an entrepreneur was by trial and error.” Gordon Earl Moore 
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Linearisation Predictor-Corrector Form 
ses Oa, (x) Ker = Xp tL (& —% Xe) 
a a ea A A 

Ox ak Keine = A (Xe) 
v, 
na dc, (x) 
: rae | 
5 5 x= Kye 
< Be= by, Xx) 

oo) Sen ares 
Xp =X Z, —C,(X aie 
k/k k/k=1 | He Oe NAR kA 

ox x= Ka ig 2 Ox? A 
v XEXK/k-1 
za = Oc, (x) 2 
: eee | Ba in eae 
5 XK le=Sees teat ~ SIRT Sy Ok me 
3 a A =hep 
a Be= D(X) 

Ap= 
jan 
e 6a, (x) 1 P Oa, 
S Ox 6 ay 
= x= Fay xaKy 
z = 
an By = 


Table 1. Summary of first, second and third-order EKFs for the case of x; € . 


10.7 Problems 


Problem 1. Use the following Taylor series expansion of f(x) 


f= f%) +5 (1= 39) VF) FOr) VW GIG ~x,) 


bea WVEE— a IVF y)E=H) 


FE OE= A) VIVEX= 2 IVE WP GEG) 


to find expressions for the coefficients a; within the functions below. 


(i) 
(ii) 


I(x) =a +@,(x-x))+a,(x-x)’. 


I(x) = a + @,(x-X,) +, (x-x))° ee ce 


(ili) f(x,y) =a +a,(x-x,)+a,(x-%)’. 
+03(V-VYy)+A(V- Yo)” + A,(x-X)(¥- Yo) 


(iv) f(x,y) =a + OKA Og Ka %p) Oso) 


“The capacity to blunder slightly is the real marvel of DNA. Without this special attribute, we would 
still be anaerobic bacteria and there would be no music.” Lewis Thomas 
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+4,(¥—Yo)+AS(V-VYy) +AQ(V- Yo)” 
+01,(X—Xy MY — Yo) + U(X Xy) (Y— Yo) 
+a,(x-—XyMV— Yo)’ - 

(v) f(x y)=a,+a,(x-x,)+@,(x-x) +0,(x- x) +, (x-%)" 
+0,(V—Yo)+A(V—Yo) +A,(Y—Yp) + A(Y- Yo)" 
+. (X— Xy MV — Vo) + Q(X %y)(V- Yo) 
+0 (X— Xp) MY — Yo)” + Xi (X—%y)'(Y— Yo) 
tis (8 =, Y= Pol Paya ay) =I) 


Problem 2. Consider a state estimation problem, where xi+1 = ax(xx) + Bews, Ve = 
CUXk), Zk =Ye+ Ve, In Which we, x4, Ye, Vi, AM), Br, cx(.) € R. Derive the 
(i) first-order, 
(i1) second-order, 
(ili) third-order and 
(iv) fourth-order EKFs, 
assuming the required derivatives exist. 
Problem 3. Suppose that an FM signal is generated by ai+1 = fake + We, Or = 


Hor + WO, det = be + Or Z? = acos(de) + vi? and z? = aysin(d) + vi. 


Write down the recursions for 
(i) first-order and 
(ii) second-order 
EKF demodulators. 


Problem 4. (Continuous-time EKF) Assume that continuous-time signals may 
be modelled as x(t) = a(x()) + w(t), yO = c(x()), 2) = yA) + v(t), where 
E{w()w"(p)} = O} and E{v(pv"(t)t = RO. 


(i) Show that approximate state estimates can be obtained from X(t) = 
a(X(t)) + K(t)(z(t) — c(X(t))), where K(A) = P(QNC™(DR'(0), P(t) = 
ACPO + PAT) ~ KCOPO + 00), A = =} and cw 
x=x(t) 
pa) 
Ox x=x(t) 
(i1) Often signal models are described in the above continuous-time 


setting but sampled measurements z, of z(t) are available. Write 
down a hybrid continuous-discrete version of the EKF in corrector- 
predictor form. 


“IT am quite conscious that my speculations run quite beyond the bounds of true science.” Charles 
Robert Darwin 
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Problem 5. Consider a pendulum of length @ that subtends an angle @(4) with a 
vertical line through its pivot. The pendulum’s angular acceleration and 
measurements of its instantaneous horizontal position (from the vertical) may be 


2 
modelled as a Ho) = = - 5 sin(o()) + w(t) and z(t) = ésin(@(t)) + v(t), 
d@ (t) £ 

respectively, where g is the gravitational constant, w(t) and v(t) are stochastic 
inputs. 

(i) Set out the pendulum’s equations of motion in a state-space form and 

write down the continuous-time EKF for estimating 6(¢) from v(4). 
(i1) Use Euler’s first-order integration formula to discretise the above 


model and then detail the corresponding discrete-time EKF. 


10.8 Glossary 


Vf The gradient of a function f,; which is a row-vector of partial 
derivatives. 

V'Vf The Hessian of a function f, which is a matrix of partial 
derivatives. 

tr(Px) The trace of a matrix P;, which is the sum of its diagonal 
terms. 

FM Frequency modulation. 


10.9 References 


[1] A. P. Sage and J. L. Melsa, Estimation Theory with Applications to 
Communications and Control, McGraw-Hill Book Company, New York, 
1971. 

[2] A. Gelb, Applied Optimal Estimation, The Analytic Sciences Corporation, 
USA, 1974. 

[3] B. D. O. Anderson and J. B. Moore, Optimal Filtering, Prentice-Hall Inc, 
Englewood Cliffs, New Jersey, 1979. 

[4] T. Sdéderstrém, Discrete-time Stochastic Systems: Estimation and Control, 
Springer-Verlag London Ltd., 2002. 

[5] D. Simon, Optimal State Estimation, Kalman H. and Nonlinear 
Approaches, John Wiley & Sons, Inc., Hoboken, New Jersey, 2006. 

[6] R. R. Bitmead, A.-C. Tsoi and P. J. Parker, “Kalman filtering approach to 
short time Fourier analysis”, JEEE Transactions on Acoustics, Speech and 
Signal Processing, vol. 34, no. 6, pp. 1493 — 1501, Jun. 1986. 

[7] M.-A. Poubelle, R. R. Bitmead and M. Gevers, “Fake Algebraic Riccati 
Techniques and Stability”, ZEEE Transactions on Automatic Control, vol. 
33, no. 4, pp. 379 — 381, Apr. 1988. 


“What we observe is not nature itself, but nature exposed to our mode of questioning.” Werner 
Heisenberg 


326 


[8] 
[9] 


[10] 


[11] 


[12] 


[13] 
[14] 


[15] 


[16] 


[17] 


[18] 


[19] 


Chapter 10 Nonlinear Prediction, Filtering and Smoothing 


R. R. Bitmead, M. Gevers and V. Wertz, Adaptive Optimal Control. The 
thinking Man’s GPC, Prentice Hall, New York, 1990. 

R. R. Bitmead and Michel Gevers, “Riccati Difference and Differential 
Equations: Convergence, Monotonicity and Stability”, In S. Bittanti, A. J. 
Laub and J. C. Willems (Eds.), The Riccati Equation, Springer Verlag, 
1991. 

G. A. Einicke, L. B. White and R. R. Bitmead, “The Use of Fake 
Algebraic Riccati Equations for Co-channel Demodulation", JEEE 
Transactions on Signal Processing, vol. 51, no. 9, pp. 2288 — 2293, Sep., 
2003. 

C. A. Desoer and M. Vidyasagar, Feedback Systems : Input Output 
Properties, Academic Press, NewYork, 1975. 

G. A. Einicke and L. B. White, "Robust Extended Kalman Filtering", 
IEEE Transactions on Signal Processing, vol. 47, no. 9, pp. 2596 — 2599, 
Sep., 1999. 

J. Aisbett, “Automatic Modulation Recognition Using Time Domain 
Parameters”, Signal Processing, vol. 13, pp. 311-323, 1987. 

P. S. Maybeck, Stochastic models, estimation, and control, Academic 
Press, New York, vol. 1, 1979. 

H. E. Doran, “Constraining Kalman filter and smoothing estimates to 
satisfy time-varying restrictions”, Review of Economics and Statistics, vol. 
74, no. 3, pp. 568 — 572, 1992. 

D. Massicotte, R. Z. Morawski and A. Barwicz, “Incorporation of a 
Positivity Constraint Into A Kalman-Filter-Based Algorithm for 
Correction of Spectrometric Data”, JEEE Transactions on Instrumentation 
and Measurement, vol. 44, no. 1, pp. 2— 7, 1995. 

D. Simon and T. L. Chia, “Kalman Filtering with State Equality 
Constraints”, JEEE Transactions on Aerospace and Electronic Systems, 
vol. 38, no. 1, pp. 128 — 136, 2002. 

S. J. Julier and J. J. LaViola, “On Kalman Filtering Within Nonlinear 
Equality Constraints”, EEE Transactions on Signal Processing, vol. 55, 
no. 6, pp. 2774 — 2784, Jun. 2007. 

G. A. Einicke, G. Falco and J. T. Malos, “Bounded Constrained Filtering 
for GPS/INS Integration“, ZEEE Transactions on Automated Control, 
2012 (to appear). 


“We know nothing in reality; for truth lies in an abyss.” Democritus 


G. A. Einicke, Smoothing, Filtering and Prediction: Estimating 327 
the Past, Present and Future (2™ ed.), Prime Publishing, 2019 


11. Hidden Markov Model Filtering and 
Smoothing 


11.1 Introduction 


The previously-discussed optimal Kalman filter [1] — [3] is routinely used for 
tracking observed and unobserved states whose second-order statistics change 
over time. It is often assumed within Kalman filtering applications that one or 
more random variable sequences are generated by a random walk or an 
autoregressive process. That is, common Kalman filter parameterisations do not 
readily exploit knowledge about the random variables’ probability distributions. 
More precisely, the filter is optimal only for Gaussian variables whose first and 
second order moments completely specify all relevant probability distributions. 
For non-Gaussian data, the filter is only optimal over all Jinear filters [1]. 


Rather than assuming that random variable sequences are generated by 
autoregressive processes they may alternatively be modelled as Markov chains. 
The phrase ‘Markov chain’ was first coined in 1926 by a Russian mathematician 
S. N. Bernstein to acknowledge previous discoveries made by Andrei Andreevich 
Markov [4]. Markov was a professor at St Petersburg University and a member of 
the St Petersburg Academy of Sciences, which was a hub for scientific advances 
in many fields including probability theory. Indeed, Markov, along with fellow 
academy members D. Bernoulli, V. Y. Bunyakovsky and P. L. Chebyshev, all 
wrote textbooks on probability theory. Markov extended the weak law of large 
numbers and the central limit theorem to certain sequences of dependent random 
variables forming special classes of what are now known as Markov chains [4]. 


The basic theory of Hidden Markov models (HMMs) was first published by Baum 
et al in the 1960s [5]. HMMs were introduced to the speech recognition field in 
the 1970s by J. Baker at CMU [6], and F. Jelinek and his colleagues at IBM [7]. 
One of the most influential papers on HMM filtering and smoothing was the 
tutorial exposition by L. Rabiner [8], which has been accorded a large number of 
citations. Rabiner explained how to implement the forward-backward algorithm 
for estimating Markov state probabilities, together with the Baum-Welch 
algorithm (also known as the Expectation Maximisation algorithm). HMM filters 


“The alleged opinion that studies in seminars are of the highest scientific nature, while exercises in 
solving problems are of the lowest rank, is unfair. Mathematics to a considerable extent consists of 
solving problems, and together with proper discussion, this can be of the highest scientific nature.” 
Andrey Andreyevich Markov 
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and smoothers can be advantageous in applications where sequences of alphabets 
occur [8] - [10]. For example, in automatic speech recognition, sentence and 
language models can be constructed by concatenating phoneme and word-level 
HMMs. Similarly, stroke, character, word and context HMMs can be used in 
handwriting recognition. HMMs have been useful in modelling in biological 
sequences such as proteins and DNA sequences. 


The Doob—Meyer decomposition theorem [11] states that a stochastic process may 
be decomposed into the sum of two parts, namely, a prediction and an input 
process. The standard Kalman filter [1] makes use of both prediction plus input 
process assumptions and attains minimum-variance optimality. In contrast, the 
standard hidden Markov model filter/smoother rely exclusively on (Markov 
model) prediction and is optimum in a Bayesian sense [8] - [10]. It is shown below 
that minimum-variance and HMM techniques can be combined for improved state 
recovery. 


The minimum-variance, HMM _ and _ combined-minimum-variance-HMM 
predictions are only calculated from states at the previous time step. Improved 
predictions can be calculated from states at multiple previous time steps. The 
desired interdependencies between multiple previous states are conveniently 
captured by constructing high-order-Kronecker-product state vectors. The theory 
and implementation of such high-order-minimum-variance-HMM filters is also 
described below. 


The afore-mentioned developments are driven by our rapacious appetites for 
improved estimator performance. In principle, each additional embellishment, 
spanning HMM filters, minimum-variance-HMM filters to high-order-minimum- 
variance-HMM filters, has potential to provide further performance gains, subject 
to the usual proviso that the underlying modelling assumptions are correct. 
Needless to say, significantly higher calculation overheads must be reconciled 
against any performance benefits. 


Some prerequisites, namely, some results from probability theory including 
Markov processes, are introduced in Section 11.2. Bayes’ theorem is judiciously 
applied in Section 11.3 to derive the HMM filters and smoothers for time- 
homogenous processes. A state-space model having an output covariance 
equivalent to an HMM is derived in Section 11.4. This enables transition 
probability matrices to be employed in optimal filter and smoother constructions 
that minimise the error variance. Section 11.5 describes high-order-minimum- 
variance-HMM filters, which employ Kronecker product states. 


“Tt is remarkable that a science which began with the consideration of games of chance should have 
become the most important object of human knowledge.” Pierre Simon Laplace 
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11.2 Prerequisites 


11.2.1 Bayes’ Theorem 


Bayes’ theorem (also known as Bayes’ rule) is used to update the probability of an 
event A in the light of new evidence B. Let Pr{A} and Pr{B} denote the (prior) 
probabilities of A and B occurring independently of each other. The (posterior) 
conditional probability of A given that B has happened can be expressed as 


Pr{A} Pr{B | A} 


Pr{A|B} = oe) 


where Pr{B\|A} is the conditional probability of B occurring given that A has 
occurred. 


11.2.2 Joint Probabilities 
The probability of event 4 and event B occurring is given by 


Pr{A, B} = Pr{A | B} Pr{B} 
=Pr{B| A}Pr{A}. 


The probability of events 4, B and C occurring is given by 

Pr{A, B,C} = Pr{A| B,C} Pr{B,C} 

= Pr{A| B,C}Pr{B|C}Pr(C). 

11.2.3 Total Probability Theorem 
The Total Probability Theorem (also known as the Law of Total Probability) 
breaks up probability calculations into parts. Given n mutually exclusive events 
Bi, ..., Bn whose probabilities Pr{B1}, ..., Pr{B,} sum to zero, then the probability 
of A occurring is given by 


Pr{A} = Pr{A|B}Pr{B}+---+Pr{A|B }Pr{B}, 


where Pr{A|B;} is the conditional probability of A occurring given that B; has 
occurred. 


“The combination of Bayes’ and Markov Chain Monte Carlo has been called arguably the most 
powerful mechanism ever created for processing data and knowledge.” Sharon Bertsch McGrayne, The 
Theory That would Not Die: How Bayes’ Rule Cracked the Enigma Code, Hunted Down Russian 
Submarines, and Emerged Triumphant from Two centuries of Controversy 
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11.2.4 Markov Chain 


The terms stochastic process and random process are often used interchangeably 
and so it is appropriate to commence with some formal definitions. 


Definition I. A discrete stochastic process is a sequence of events in which the 
outcome at any stage depends on some probability. 


Definition 2. A discrete Markov process is a stochastic process with the following 
properties: 
(a) The number of possible outcomes is finite; 
(b) The outcome at the next stage depends only on the outcome of the 
current stage and not on the previous stages. 


Assumption I [8]: Let x; be a finite state, discrete-time, first-order Markov 
process, that takes on (or has been quantised into) an alphabet of n discrete values 
in a set, i.e., 


xe € {qi,--5 grt, qie R. (1) 


The probability that x, takes on value g; is denoted by z,(i) = Pr{xx = gi} = 0. It is 
convenient to stack these probabilities into a so-called probability distribution 


vector as a = [m(1), m(2), ..., a(n)]’ € R" in which yam (i) = 1. The 


attention is confined to stationary (or time-homogenous) stochastic processes, in 
which the probabilities are constant over time. In particular, it is assumed that the 
probability distribution vector states evolve according to 


Ty, = AM; (2) 
for all k € [1, ...,.N], where A = {aij} € R™ , a,, =Prix,,,=9,;|x, =9,}, 18a 
column-stochastic transition probability matrix for zm satisfying a,, 2 0, 
Da a;,; = 1. A sequence or chain of states 2; generated by (2) is known as a 


first-order, discrete Markov chain. Suppose also that x; is hidden and observed 
indirectly by measurements 


Vp =X, t+V, 5 (3) 


where vy, is a white, Gaussian measurement noise process. Equations (1) — (3) are 
commonly referred to as an HMM. 


“In science, progress is possible. In fact, if one believes in Bayes’ theorem, scientific progress is 
inevitable as predictions are made and as beliefs are tested and refined.” Nat Silver 
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Note that the column and row indices of A pertain to the next and current 
probability distribution states, respectively, i.e., 


next state 
a, 7 a, 
A=current state 
ayy Qin 


Let z; denote an initial probability distribution vector of the Markov chain. Then 
repeated application of (2) yields z,,, = A"'z,, k> 1. If the Markov chain is non- 
absorbing or irreducible (it can go from any state to any state) and is non-periodic, 
it has a stationary (or limiting) distribution denoted by z = lim m,. The stationary 


distribution, which satisfies 2 = Az, is a useful quantity, and can be found by 
singular value decomposition. 


Example 1: Suppose that a Markov process x;, k = 1,..., 10, has the sequence 1, 1, 

2,2, 2,1, 1, 2, 2, 2. By inspection Pr{x1=1 Hl} = 0.5, Pr{xai=2|r=1} = 0.5, 
0.5 0.2 

0.5 0.8] 


The columns of A sum to unity (by construction). It is easily verified that the 
stationary distribution is z = [0.29, 0.71]. 


Pr{xpi=lpx=2} = 0.2 and Pr{xp1=2|x=2} = 0.8, which yields A -| 


11.33 HMM Filter and Smoother Derivation 


11.3.1 HMM Filter 


It is convenient to temporarily make the following simplifying assumption. 


Assumption 2 [9] — [8]: Without loss of generality, temporarily assume that x; is a 
finite state, discrete-time, first-order Markov process, that takes on integer index 
values 


read Oe (4) 


The corresponding transition probability matrix in (2) is now A = {aij} € R””, 
a Pr{x,,, =i|x, = j}. Assume again that indirect measurements (3) are 


iyj 
available. 


"I always avoid prophesying beforehand because it is much better to prophesy after the event has 
already taken place." Sir Winston Leonard Spencer-Churchill 
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The filtering objective is to find optimal estimates x, € {1, ....2} of the hidden 


states x;, for the measurement sequence y; to yz. From the approach of [8], suitable 
estimates are found from the a posteriori state probabilities at time k and 
observations y, to yz, 


y= Pr Ha Vawadieks (5) 
The cost of evaluating (5) grows exponentially with k and therefore a recursive 
expression for @, is derived below. Applying the law of total probability and the 


joint probabilities formula to (5) then simplifying using the Markov property for 
observations yields 


a, (i) = Dry = 1,44) =D Von Vea Vid 


jal 


= Priv, [X_ HEX pa = SMe Vert Petey, =F | ya = Vio Meat Prin = Ms 


j=l 


= Pr{y, |x, = ee Prix, =i | Xp = JSP HAV Vat 


= CO Ya, a0). 
(6) 


where 
C, (i) = Prt{y, |x, =i, (7) 


are conditional observation likelihoods (and called observation probability 
distributions in [8]). The conditional observation likelihoods are assumed to be 
known since they can be calculated entirely from the measurements (3). 


The a@,(i) are filtered probability distributions and are often termed forward 


probabilities. The filtered indices (under temporary Assumption 2) are obtained as 
the largest or maximum a posteriori (MAP) estimate [8] - [9], 


i=argmax a,(i). (8) 


lsi<n 


Returning to Assumption 1, the filtered values are 


x, =. (9) 


“An approximate answer to the right problem is worth a good deal more than an exact answer to an 
approximate problem." John Wilder Tukey 


Vest 
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Thus, the HMM filter for the signal model (1) — (3) is parameterised by a 
transition probability matrix (5), conditional observation likelihoods (7), together 
with a posteriori state probabilities (6), and the filtered values arise as the most 
likely estimate (9). 


11.3.2 HMM Fixed-Interval Smoother 


The smoothing objective is to find optimal estimates x,,, € {1, ....n} of the 


hidden states x, given measurements y; over the entire fixed interval & € [1, ...,7]. 
The smoothed estimates are found using the well-known forward-backward 
algorithm [8], which requires the forward probabilities that were derived in the 
previous section, together with the backward probabilities of observations yx+1 to 
yr occurring given that x; = i. From the approach of [8], the backward 
probabilities are defined as 


BG) = Pr{Yysie-s Vr |X, =H. (10) 


A recursive expression for £,(i) is derived below. Applying the law of total 
probability, the joint probabilities formula in (10) and simplifying using the 
Markov property for observations yields 


B,() = DD ces era erie en = jf |x, =3 


jal 


= > Pri geass | Vest Xe =F Xpar = SPV es Xen |e =B 


j=l 


= DPE oa awe | Wears %e =F Keay = PPV ea | tea = Fe = PT = 1%, = i 


jel 


= Paik > [Xear = SS Pr Meas | He = Prin =|, =D 


jal 


= Y Bo ICDA, (k+l). 


Finally, the smoothed posterior probability distributions are calculated as 


1) = DUP, =i) My Vr} 


j=l 
ois pyeaaet =F VWs-eVW, Varro Irt 
j=l Prva Ne) 


“By seeking and blundering we learn.” Johann Wolfgang von Goethe 


(11) 
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ye =| Vester Y, [X= OPIX, HEY, 0 Vr} 
j=l Pr(y,,---2 Yr) 

me -AGLAU, 
LO 


The normalisation term within (12) can be obtained as Pr{y,...,y,} 


(12) 


>> £,(i)a,(i) to ensure that > y,(i) = 1. The smoothed indices (under temporary 
i=l i=l 

Assumption 2) are obtained as the largest or maximum a posteriori (MAP) 
estimate [8] - [9], 


i=argmax y,(i). (13) 


lsi<n 


Returning to Assumption 1, the smoothed values are 


Xun =%- (14) 


Thus, the HMM smoother for the signal model (1) — (3) is parameterised by the 
forward, backward and smoothed probabilities, (6), (11) and (12), respectively. 
The smoothed values arise as the MAP estimate (14). 


The HMM filter and smoother designs require estimates for A and C, which can be 
obtained in the following. 


11.3.3 Transition Probability Matrix Estimation 


Recall that a transition probability matrix describes the transitions of a Markov 
chain, and under Assumption 1 is defined as A = {ay} € R”” , 
a, , =Prix,,, =; |x, =4;}. The aj; are the probabilities of transitioning from the 


ij 
current (column) state to the next (row) state. Thus, the columns of A represent 
families of probability distributions. 


Suppose hypothetically that a state in a Markov chain is absorbing, namely, if 
once the state is entered, it is impossible to leave. In this case, the column of A 
corresponding to the state has a one on the main diagonal and zeros elsewhere. 
Thus, if all the states are absorbing, A would be an identity matrix. Next consider a 
more general case where states in a Markov chain change slowly from their 
neighbouring states. In this case, the components of A would be distributed about 
the main diagonal. 


“There may be babblers, wholly ignorant of mathematics, who dare to condemn my hypothesis... I 
value them not and scorn their unfounded judgement.” Nicolaus Copernicus 
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Now suppose that the probability distributions about the main diagonal of A are 
Gaussian. Similarly to the approach in [9], assume that ai; ~ M(qj- qi, o2), 
where g is an autoregressive-order-1 coefficient that has been estimated for the 
process x;, the q; are linearly spaced over the range of x; and o~ is its sample 
variance. It follows that the components of A can be estimated as 


qi; j Fa (J2z0,)"' exp(—(q; -99q,)° /(202)) . (15) 


The construction (15) results in each column of A being a vector of Gaussian 
probability densities. 


11.3.4 Conditional Observation Likelihood Estimation 


Suppose that the measurement noise within (3) is a zero-mean, Gaussian process, 
i.e., ve~ NO, o ). Following the approach of [9], it may be assumed that C,(i) ~ 


N(v- di: & ). It follows that the components of C can be estimated as 


C.(i) = (W270, )" exp(-(, -4,)’ / (202). (16) 


Similarly, the construction (16) results in each column of C being a vector of 
Gaussian probability densities. 


Example 2: A 6-second utterance of “a e 1 0 wu” was sampled at 44.1 kHz and 
recorded in an Mpeg-4 audio format to serve as an example speech signal x, € R. 
Noisy measurements y; were then generated by adding Gaussian, white noise 
realisations to the speech signal. Minimum-variance filters and smoothers were 
designed assuming a_ signal model ¥x,,,=@x,+w, with parameters 


P= Ely VY, (EYMI-O,) and of = Ely,yp} — R -A(E{y, yj }-R)AT 
estimated from the (noisy) measurements. The root-mean-square (RMS) error 
performance of the minimum-variance filters and smoothers are indicated by the 
dotted lines of Fig. 1(a) and 12(b), respectively. The measurements were also 
quantised into = 32 levels. The performance of HMM filters and smoothers, with 
(15) — (16) estimated from the measurements, are indicated by the dashed lines of 
Fig. 1(a) and 1(b), respectively. The performance of HMM filters and smoothers, 
with (15) — (16) estimated from noiseless speech signals, are indicated by the solid 
lines of Fig. 1(a) and 1(b), respectively. It can be seen that the HMM 
filters/smoothers can outperform the optimal minimum-variance solutions at low 
SNR, provided that noiseless “training” data is available for parameter estimation. 
The figures also demonstrate that signal quantisation degrades fidelity and leads to 
degraded high-SNR performance. 


“A life spent making mistakes is not only more honourable, but more useful than a life spent doing 
nothing.” George Bernard Shaw 
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Filter RMS Error 


Smoother RMS Error 


0.2 
-10 5 0 5 


SNR, dB SNR, dB 
Fig. 1(a) RMSE versus SNR for: (i) Minimum- Fig. 1(b) RMSE versus SNR for: (i) Minimum- 
variance filter; (ii) HMM filter trained on noisy variance smoother; (ii) HMM smoother trained 
measurements; and (iii) HMM filter trained on on noisy measurements; and (iii) HMM smoother 
noiseless signals. trained on noiseless signals. 


11.4 Minimum-Variance-HMM Filtering and 
Smoothing 


11.4.1 Motivation 


Optimal filters and smoothers [1] - [3], [12] minimise the error variance by 
exploiting knowledge of the signal generating and measurement noise processes. 
Many tracking problems involve repetitive trajectories. Within surface mines, for 
example, trucks repeatedly drive up and down the same haul roads. On stockpiles, 
dozers travel up and down similar inclines/declines. At export terminals, vehicles 
are driven along similar trajectories. This section is concerned with improving 
minimum-variance filter and smoother performance by additionally exploiting 
knowledge about such repetitive trajectories. 


In optimal Kalman filtering and smoothing applications [1] - [3], [12], it is 
routinely assumed that one or more state sequences are generated by a random 
walk or an autoregressive process. If the signal generation process has a low-pass 
spectrum then the optimal filter and smoother will also have low-pass spectra. 
That is, common Kalman filter parameterisations do not exploit knowledge about 
the random variables’ probability distributions. This shortcoming has motivated 
the development of the previously-discussed HMM filters and smoothers. 


Kalman and HMM filters/smoothers have different strengths and weaknesses. The 
optimal Kalman filter and smoother solutions minimise the variance of the 
estimation error. HMM filters/smoothers exploit knowledge about the random 
variables’ probability distributions and are optimal in a Bayesian sense but do not 
explicitly minimise the mean-square error (MSE). They model systems as Markov 


“There’s nothing quite as frightening as someone who knows they are right.” Michael Faraday 
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chains which have mutually exclusive states. Consequently, HMM filter outputs 
are discretised. Kalman filters employ predictions that involve linear combinations 
of previous states. Thus, Kalman filters are less suited to problems possessing 
mutually exclusive states and their outputs are not constrained by discretisation 
boundaries. 


Improved estimation performance is sought by exploiting both transition 
probability knowledge and minimum-error-variance optimality. This entails 
finding a state-space realisation of a stochastic process whose output is equivalent 
to that of an HMM. The desired stochastic model may then be used within 
filter/smoother designs to attain MSE performance benefits. 


11.4.2 Prior Literature 


There are two common solution approaches for including transition probability 
knowledge within filter/smoother designs. First, implementing coupled Kalman 
and HMM filters [14] - [22]. Second, parameterising a single minimum-variance 
filter for recovery of Markov chain states [15], [23] - [26]. 


An extended Kalman filter is coupled with an HMM filter in [14] — [15] for 
demodulating differential phase shift keyed and frequency modulated signals. A 
coupled HMM filter and Kalman filter can be used for interference suppression 
[16]. The merging of a HMM and a Kalman filter for tracking space vessel CO 
concentrations is described in [17]. Time-domain and frequency-domain HMMs 
can be used to identify model parameters prior to filtering and smoothing of noisy 
speech [18] - [20]. In a position tracking application [21], an HMM is used to 
estimate a control input for a Kalman filter. In an image target tracking problem 
[22], an HMM is used to estimate position coordinates which then serve as a 
measurement input for a Kalman filter. 


Rather than coupling two filters, knowledge of a signal’s probability distributions 
can be used in the parameterisation of a single minimum-variance filter. This 
approach has been considered previously in [15], [23] - [26]. In [15], a system’s 
input is termed a semi-Martingale increment. A single-output, time-invariant, 
linear state model, which has the same second-order statistics as a hidden Markov 
chain (HMC), is described without proofs in [15]. A state-space system can be 
transformed into an innovations or prediction error representation [24] - [26]. It is 
assumed in [15], [25] that the transition probability matrix and the output matrix 
are known from which an input process covariance is calculated. These parameters 
are used to calculate an algebraic Riccati equation solution for which convergence 
is established in [25]. A subspace identification algorithm arises from a 
factorisation of the singular value decomposition of an estimated linear prediction 
matrix in [24]. This yields least-squares estimates of the transition probability and 


“Torture the data and it will confess to anything.” Ronald Coase, winner of the 1991 Nobel Prize in 
Economics 
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gain matrices, without having to explicitly estimate an input covariance and solve 
a Riccati equation. A rigorous understanding of Markov model subspace 
identification is developed in [26]. Although convergence and consistency results 
are developed, it is concluded in [26] that the problem of finding a stochastic 
matrix solution for a transition probability matrix remained unsolved. 


Many other techniques have been reported for identifying model parameters — see 
[27] - [39] and the references therein. The statistical properties of quantised 
variables is surveyed in [27] — [29] and the impact of quantisation on moment or 
likelihood based approaches to system identification is studied in [31] - [32]. 
Candidate parameter estimation techniques include least-squares, EM algorithms, 
sub-space identification and convex optimisation. A closed-form least-squares 
estimate of a transition probability matrix is stated in [33]. EM algorithms are 
described in [8], [35]. Sub-space identification methods, which are also known as 
spectral algorithms, are surveyed in [24] - [26], [36] - [39]. These methods find 
approximate factorisations of estimated covariances between past and future 
observations. A spectral algorithm for learning discrete-observation HMMs is 
proposed in [37], which does not require estimating the transition and observation 
matrices. The spectral algorithm of [37] is generalised for HMMs that possess 
kernel structures in [38]. The assumptions within [37] are relaxed for which 
reduced-rank HMMs are developed in [39]. 


11.4.3 Signal Models 


11.4.3.1 Hidden Markov Model 


Consider again a time-homogenous Markov chain with n states, X:, k > 0, which 
takes integer values, without loss of generality, in a finite set {1,2,...,”} CR. 
Denote /7 = diag(z),...,2,), where z = [7, 7, ..., iJ’ € R", a = Pr{X;= i}, is the 
unique stationary probability distribution of the states which satisfies 


a = An, (17) 


in which 4 = {a;;} ¢ R”” is a column-stochastic transition probability matrix for 
Xp, 1.€., aij = Pr(Xi+i = |X, = 7}. Let Yi, k > 0 be a HMC, dependent on X;, taking 
on values ina set V CR’ with probability distribution 


EY} =Mza, (18) 


where M € R” is an observation probability matrix. A brief derivation of 
unconditional second-order moments of the HMC, namely, E{Y,,.Y/} and 


k+et 
E(Y,Y,"}, is provided below. 


“I keep saying that the sexy job in the next 10 years will be statisticians, and I’m not kidding.” Hal 
Varian, Chief Economist, Google 
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Following the approach in [23], assume there exist bounded conditional first and 
second order moments m; « R’and 2, ¢ R” defined by m, = E{Y, |X, =i 


and L, = E{Y,Y, |X, =i}, respectively. It is | observed that 
EY} =>) am, = Mx, where M= [1m mp... mm] ¢ R”” . Denote @(r) = {®;;(r)} € 


i=l 
R”" , Di (1) = Pr{Xi+1=i | Xx=j} for t > 0, which is consistent with D(0) = I, and W(t) 
= At. Assume that Y; satisfy the conditional independence property 


k 
PEAY es¥, | Xpsg Xp =|) [PHY |X, (19) 
t=0 


Without loss of generality, consider a simpler case in which the HMC means x“ = 
Mn are zero for all k > 0. Condition (19) ensures, E{Y,,-Y, | X,,. =j,X, =i} = 


mm; when t # 0. Now the (unconditional) second order moments of the HMC Y, 


+0 k+e 


can be determined. For t > 0, E{¥,,.¥/} = Jyxy yo7 Pry dnd, = 


Jyyxy yw" > Ply ying ni.Xe= {Yat PA, = fs Xp. =H = 


i,jal 


mm) 7, (7) = M@(c)TIM". Also, E{Y,YE}= yxy yy" Pr, fy}dy 


ijl 


foxy WW" YP ry iy, fy} Pr{X, =Hdy = VV, +mm)z, = VV, + MIM", 
i=l i=] 


i=l 
where V; = cov(YilX, = i, YidXi = i) > 0 denotes the conditional observation 
covariance given X; = 1. 


11.4.3.2 State-Space Model 


Now consider the linear, time-invariant, state-space model 


X,4, = Fx, + Gu, (20) 
y, = Hx, + Jw, , (21) 


where xx, ue € R", yx, we € R', F, Ge R”’,H € R”’ is known as an output 
mapping and J ¢ _R””. The exogenous input sequences ux, we are zero-mean, 
uncorrelated, white processes with unit covariances, ie, Ef{u,ul} = 


6, Ly, E{ww} = 6,,Ly >. E{u,w,} =0, where 6,, denotes the Kronecker delta 


“For every fact there is an infinity of hypotheses.” Robert Pirsig, Zen and the Art of Motorcycle 
Maintenance 
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function and J, €¢ R”", J, € R” are identity matrices. Derivations of 


predictors, filters and smoothers based on the state-space model (20) — (21) are 
detailed in Chapters 5 and 7. 


11.4.3.3 Equivalent Markov Model 


In order to design a filter and smoother that exploit transition probability 
knowledge and attain minimum-variance optimality, a state-space model that 
possesses a transition probability matrix and vector observations is developed. 
Following the approach in [23], it is shown below that the state space system (20) 
— (21) can be parameterised so that its output covariances match those for the 
HMC. Let Vi = cov(V¥ilXi = i, YilXi = 1) > O denote the conditional observation 
covariance given X; = 1. 


Lemma I [23]: In respect of the above HMM and the state-space system (20) — 
(21), suppose that 1) A and IT are known, then the parameterisation: 


2)H=M-: 
3) F=A-al', where 1 = [1, ..., 1]"; 
4) P=II> 0; 


5) G is any non-singular matrix satisfying GG’ = IT— FTF’ > 0; 
J is any non-singular matrix satisfying JJ" = be 4 > 0, results in: 


(a) EW, Y}= EY yh: 


(B) FAY, Ye} = EV Vib} 
for all k and t. 


Proof: (a) It follows from (21) that E{y,y,} =HPH' + JJ’, which together with 
conditions 1) and 5) yield the result. 

(b) It is assumed that the initial mean E{Zo} = 0 and covariance P = E{x,x,} = 
E{x,x,} is the solution to the Lyapunov equation P = FPF' +GG". Note that for 
t> 0, 


k+r-1 


Xue = VOX, + Y Wlttr-s-l) Gu, 22) 


where yw(t)=F* for t > 0. So Efxk = 0 for all k > 0. From (22) and the 
properties of the process ux (i.e., E{u,x'} = 0 for alls <k) it holds that for t > 0, 


k+r-1 


Ex, Mm} =V(DEMxy t+ Y) wk+co-s-NGE{u, x} =P, (23) 


s=k 


“The great tragedy of science — the slaying of a beautiful hypothesis by an ugly fact.” Thomas Henry 
Huxley 
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and fort <0, E{x,,.x,} = P'w' (-r) . It follows from (21) and (23) for t > 0 that 


Ey.) = Hy (c)PH . (24) 
Thus, from Conditions 2) and 4), and (21), it needs to be shown for t > 0 that 
MO(r)IIM" = Hy(c)PH". (25) 


For the initialisation step of an inductive argument, suppose (25) holds for t = 1, 
namely, Hy(\)PH’ = MATIM’ = M@(1)IIM", since it is assumed that Mr = 
0. For the inductive step, suppose (25) holds for some t > O and consider 
Hy(c+1)PH" = MFO(r)IIM! = M4@(r)TIM? = MO@(r +1)". 


The probability transition matrix A possesses a maximal eigenvalue of | (see 
[41]). Condition 3) of Lemma | shifts the unstable eigenvalue to the origin and 
leaves the other eigenvalues unchanged. Thus, Lemma | specifies the matrices 
associated with a stable linear state space model so that the second-order moments 
of its output match those of a specified HMM. The input matrix G can be selected 
so that the state covariance is diagonal (see also [23], [25]). That is, setting P = IT 
> 0, and noting that F is stable, a non-singular G can be found by Cholesky 
factorisation of J7 — FILF’. This parameterisation is illustrated by the following 
example. 


0.8 0.3 
0.2 0.7 
0.1999 = -0.3 
0.2. 0.2999 
0.4), an input covariance is obtained as GG’ =P-FPF' and a Cholesky 
0.7348 0 

0.0816 0.5773 


Example 3: Consider A = ( for which z = [0.6 0.4]. A stable state 


matrix is found from F = A - n1T = . By setting P = IT = diag(0.6, 


factorisation yields c-| | By construction, the solution of 


P=FPF'+GG’' isP=TI. 


“You never change things by fighting the existing reality. To change something, build a new model 
that makes the existing model obsolete.” Richard Buckminster Fuller 
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11.4.4 Filter and Smoother Design Procedure 


11.4.4.1 Problem Assumptions 


The usual problem in HMC estimation is to determine the a posteriori 
probabilities of the states at each time, given the observations (in either a filtering 
or a smoothing context). Problems such as this can be solved using HMM filters 
and smoothers such as [8] - [10] discussed in Section 11.3, in which it is assumed 
that the underlying model is linear and the noises are Gaussian. 


A linear state space model is described above that motivates the application of the 
optimal filter and optimal smoother to discretised signals - which do not rely on 
Gaussian noise assumptions. Let An, Fn, H, and Gn, respectively denote A, F, H 
and G at n discretisation levels. The following is assumed: (1) observations yx = Yx 
are available; (2) H, and n are selected by the designer; (3) the measurement noise 
covariance JJ’ can be estimated from the observations during periods when the 
signal is absent; (4) F, and G, can be identified; and (5) the states x, can be 
estimated by a linear filter or a smoother that operate on Y,. 


11.4.4.2 Identified Parameters 


Suppose that observations Y;, k € [1, N], are discretised into n levels to yield 
initial estimates %,= [%,,,....%,,]' = 9,(Y,) of states x,, where g,:R — R" 


denotes a discretisation function that produces 


. _fl Ph, -A/2S¥, <h, +A/2 (26) 
at otherwise ; 
in which ; ... hn € R are discretisation level centres spaced A apart. Let H, = 
[A1,..., in] € IR” and denote its pseudo inverse by H} . The initial state estimates 


may be written as %, = H'Y, =x, +H'Jw,. Since x, and w, are independent, it 
follows that 


ae oxxl + HTH (27) 


Standard least-squares estimates of A, and G,G! can be calculated from [23], 
[25], [26] and [33] as 


A, = FAX Ga, JERI > (28) 


“Without imperfection, you or I would not exist.” Stephen William Hawking 
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GG, = ER ak} - 4A, (EAA, (29) 


Alternatively, A, may be constructed as columns of conditional probability 
distributions, i.e., 


and 
ra ara oF no ora 4r\ 
Mist |] Me Xie |] Xe 
a N-l] aA A N-l| a A 'Z A AT nN ATS Sa 
A, = 25 Nicer |] Xie poe Xin || Xe = EXX,,%, SEY, I) 
| en k+l ul x, k aan: st] | Xn k 


which is identical to (28). Thus, the least-squares estimate (28) is a stochastic 
transition probability matrix. The ensuing filter and smoother require that (29) is 
positive definite which was established by the author of [23] as set out below. 

Lemma 2: Suppose that F, is a stable matrix obtained from An using Condition 
(3) of Lemma I and Py is a symmetric positive definite matrix, then the matrices 


(a) O, = P. — F.P.F" and (b) (29) are positive definite. 


Proof: (a) Since F, is stable, it has an eigenvalue decomposition F, = UAU"! 


where A = diag(A1, ..., An) is generally a complex diagonal matrix with | A; | < 1, i 
=1...n. Let F." denote the Hermitian conjugate of F,. We have Q, = P, - 


F,PFT = P ~UAU'P(U")'A"U" ie, 
O,=P,-AP,A", (30) 


where O, = U'Q(U")" and P, = U'P(U")"'. Note that P. is positive if 
and only if P is, and O, is positive definite if and only if O is. Let Vec(.) denote 
the columns of a matrix stacked into a vector. Equation (30) implies Vec(O) = 
(I-A® @ A)ec(P. ) = (1-A")®@(I-A))Vec(P.). Now a Hermitian matrix W 
is positive definite if and only if x"Wx > 0 for all nonzero vectors x. This is 
equivalent to the condition that x" ®x"Vec(W) > 0 for all nonzero vectors x. 


Now let x € C" be nonzero and consider 


"Failure is only the opportunity to begin again more intelligently." Henry Ford 
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x” @x"Vec(Q,) =(x" @x" (I-A”) @(I-A))Vec(P.) 


=((x"(-A")) @(x"(I-A)))Vec(P,) 


: (31) 
=(y' @y" Wec(P,), 


where y = (I-A")x is an invertible mapping since I-A" is non-singular. It 


can be seen from (31) that since P. is positive definite, O, is positive definite. 


n 


(b) The result follows by considering P, = %,%, —H;JJ' H*" > 0 within (a). 


11.4.4.3 Model Order Design 


The performance of the filter and smoother described below depend on the 

number of discretisations. Coarse discretisation can result in a loss of (recovered) 

signal fidelity. If the discretisation is too fine the empirically derived transition 

matrices may be rank deficient which can affect observability and reachability. 
H 


n 


AF 
Consider the observability matrix O(F,,H,) = "" | and the reachability 


t 


H Fr" 


aon 


matrix O(F,,H})' =[G, F,G, + F/'G,].1f O(F!,H7)’ is rank deficient 


n? 


not all modes of the linear system will be excited. If O(F,,,,) is not of full rank 


t 


then not all the states can be recovered from the system’s output. 


The statistical properties of quantised variables is surveyed in [27] and stochastic 
observability for HMM filters is addressed in [28]. However, the studies on 
quantisation and system identification [27] - [32], and observability [28] - [40] do 
not provide guidance on the choice for n. Consequently, a model-order reduction 
decision process is developed in the following. 


Checking if O(F,,H,,) is of rank n is equivalent to checking if the function 
f(n,O(F,,H,)) =n— rank(O(F,,H,,)) is positive. It is noted under simplifying 
conditions below that this function increases monotonically with n. 


Lemma 3: Suppose that (i) F, is a diagonal matrix so that f(n,O(F,,H,))=9. 
Then f(nt+i,OVF,,,.H,.;)) = f(nti+ lo, H1,,,,,,)) for integer i = 0. 


1 nti +i+1? 


“Technological progress is like an axe in the hands of a pathological criminal.” Albert 
Einstein 
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Proof: Since the diagonal of the transition probability matrix possesses zeros for 
integer i = 0, it follows that rank(F.,,,,) = rank(F.,,), from which the claim 


follows. 


It can similarly be shown that f(n,O(F’,G/)’)= N — rank(O(F’,G")') 


increases monotonically with N. Therefore, it is advocated that the model order is 
reduced whenever (/,,,) is unobservable or (F,,G,,) is unreachable, i.e., 


n—n-lif f(n,O(F,,H,)) >0 
or f(n,O(F; ,H,)") > 0. e 


By deflating the model to enforce observability and reachability, an identified 
linear system will have a minimal realisation (see Thm. 2.4-6 of [43]). 


11.4.4.4 Filter and Smoother Recursions 


Optimal causal estimates of the Markov chain states can be obtained by operating 
a minimum-variance filter [1] on the HMC measurement data Y; at time k. Let 
X,,, and xX,,,,, respectively denote filtered and predicted state estimates given 


data at time k. These estimates are calculated as 


Kain 7 Kein +PHIO"(Y, — FAX —H,7) > (33) 
(34) 


Mest kik " 
where K = F. PH'Q"' is the predictor gain, Q = H,PH? + JJ’, in which P ¢ 
R”" is the solution of the ARE P =F, PF’ -KQK'+G,G!. The inclusion of 


H,x within (33) ensures that the filtered state estimates are zero-mean. The 
filtered output is given by ),,,=H,x,,+H,a which retains the properties 
common to the optimum solution, namely, it is unbiased and minimises the 
variance of the output estimation error [12]. Noncausal estimates of the Markov 
chain states can be obtained by operating an optimal fixed-interval smoother [12] 
on the HMC measurements over & € [1, N]. The smoothed output estimates, },,,, 


can be obtained from (34), 


Ope OE HH tesa) (35) 

Ga = (Fy ~ HK" 6, + HO a,, gy =0, (36) 
B, = =K"6, +0 a, > (37) 

Yun =X ap, > (38) 


“The factory is the machine that builds the machine.” Elon Reeve Musk 
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where ax, Be ¢ R, & € R”. Similarly, the smoothed output estimates (38) are 
unbiased and minimise the variance of the output estimation error [12]. Next, 
conditions are stated for the filter (33) — (34) and smoother (35) — (38) to be 


asymptotically stable. Let A,(X) denote the i" eigenvalue of X. 


Lemma 4: Under the conditions of Lemmas I — 2, additionally assume that 
(F.,H,,) is observable, (F,,G,) is reachable and JJ’ > 0, then | A,(F, - KH,)| 
<Ji=l.un. 


Proof: Condition 2) of Lemma 1 ensures that |A,(F,)| <J,i=1...n, and Lemma 
2 establishes that G,G’ > 0, which together with the above additional 


n~n 


assumptions satisfy the conditions of Thm. 2.1 of [42]. 


The estimation of discrete Markov chain states is usually the purview of optimal 
Bayesian solutions such as HMM filters/smoothers and it may seem counter- 
intuitive that the linear filter and smoother can be employed for this task. It is 
verified below that the desired outputs can be recovered exactly when the 
measurement noise is negligible. 


Lemma 53: (a) lim Yee =Y,; (b) Jim Pauw = Te 


Proof: (a) It follows from (33) that },, = (1 —H,PH'Q")H,Z,,, + 
H,PH'Q"'Y, + (-H,PH!Q") Hz which together with lim H,PH'Q" = 
JJ’ 30 


I yi lim y,, =Y,. 
yields pe ele = Ah 


(b) By inspection of (38), lim ara oe 
JJ 70 


An understanding of why the filter becomes more precise at high SNR follows by 
recognising that the output estimator approaches a short circuit (and passes the 
measurements straight through). Thus filters and smoothers may be designed by 
discretising the observations (26), using the parameter estimates (28) - (29), 
enforcing stability (Lemma 1), observability and reachability (32). 


The state correction (34), prediction (33) and output estimate respectively involve 
N, N and 1 inner products of m-dimensional quantities. Hence, the total filter 
calculation cost over a length-N interval is (2N+1)n inner products. The smoothed 
estimates are conveniently obtained by time-reversing the f,, (see, [12]), in which 
case (4N+1)n inner products are required over the interval. Thus, any observed 


“Progress imposes not only new possibilities for the future but new restrictions.” Norbert Wiener 
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performance benefits needs to be reconciled against the considerable increase in 
the number of calculations. In the event that the above calculation costs are 
prohibitive, it may be beneficial to design first-order filters and smoothers with 


A, =H,F.H! 


nnn ? 


instead of estimating an A, directly from (28). 


11.4.5 Application 


Improved position estimates are desired for tracking vehicles in cluttered mining 
environments. Raw Global Positioning System (GPS) measurements tend be 
inaccurate whenever vehicle receivers operate during poor satellite geometry, 
within canyons and under obstructions. Multipath interference and radio spectrum 
congestion can also cause GPS signal degradation. 


Additional sensors can be used to improve land-based navigation performance 
including inertial navigation systems (INS), cameras and laser scanners. Optimal 
filters and smoothers [1] - [3], [12] are typically used to mitigate sensor noise. For 
example, optimal filters are routinely employed in integrated GPS-INS positioning 
applications such as [13]. 


Vehicle position tracking technologies that improve productivity and safety are 
desired within mining and allied transport industries. Mining vehicle position 
tracking problems often exhibit the following two characteristics. First, position 
estimates are corrupted by noise. Second, the traverses are repetitive. At export 
terminals, vehicles are driven along similar trajectories. An example that is 
motivated by tracking dozers on a coal stockpile at an export terminal is described 
in the following. 


Caterpillar D11 dozers are used to manage about 65 Mt of coal annually at the 
Port of Gladstone. When incoming coal is delivered by overhead conveyors, the 
dozers are required to push the coal onto stockpiles. Conversely, when coal is to 
be delivered to ships, the dozers are required to push the coal from the stockpiles 
to discharge feeders situated under the conveyors. The dozers repeatedly operate 
under the overhead conveyors and these overhead structures obscure the paths to 
GPS satellites and contribute to multipath interference. Techniques are therefore 
desired which improve the dozers’ noisy GPS position estimates. 


Zero-mean, unity-variance samples of northings, eastings and altitude positions 
reported by a Novatel OEMV-3-L1 RTK GPS receiver that travelled from the 
bottom to the top of a coal stockpile at Gladstone are shown in Fig. 2(a). A 
simulation study is now described in which measurement noise is added to the 
dozer position measurements. The objective of this study is to demonstrate that the 
filter (33) — (34) and smoother (13) — (16) can outperform conventional HMM and 


“In our struggle to understand the history of life, we must learn where to place the boundary between 
contingent and unpredictable events that occur but once and the more repeatable, law-like phenomenon 
that may pervade life’s histories as generalities.” Stephen Jay Gould 
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minimum-variance filters and smoothers for recovering position estimates from 
the noisy measurements. 
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Fig. 2. Data for the position tracking example: (a) sample zero-mean unity-variance northings (i), 
eastings (ii) and altitude (ii) positions; (b) HMM filter (i), first-order minimum-variance filter (ii) and 
developed minimum-variance filter (iii) RMS position errors; (c) HMM smoother (i), first-order 
minimum-variance smoother (ii) and developed minimum-variance smoother (iii) RMS position errors. 


“New scientific ideas never spring from a communal body, however organized, but rather from the 
head of an individually inspired researcher who struggles with his problems in lonely thought and 
unites all his thought on one single point which is his whole world for the moment.” Max Karl Ernst 


Ludwig Planck 
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Independent Gaussian noise realisations were added to the GPS northings, 
eastings and altitudes. The northings, eastings and altitude measurements were 
quantised into N = 64 levels. An HMM filter and a fixed-interval HMM smoother 
were implemented using the techniques described in Section 11.3. In particular, 
the forward, backward and smoothed probabilities were respectively calculated 
from (6), (11) and (12), assuming Gaussian probability distributions (15) — (16). 
The observed HMM filter and HMM smoother RMS error versus signal to noise 
ratio (SNR) are indicated by lines (i) of Fig. 2(b) and Fig. 2(c), respectively. 


A first-order minimum-variance filter [1] - [3] and smoother [12] were also 
applied to the raw measurements. The RMS errors exhibited by the minimum- 
variance filter and smoother are indicated by lines (ii) of Fig. 2(b) and Fig. 2(c), 
respectively. It can be seen that the minimum-variance estimators outperform the 
HMM-based estimators when the model parameters are estimated from the 
available measurements. 


The procedure described in above was used to identify the unknown Fy and Ov 
from the initial state estimates (26). It was noticed that enforcing observability and 
reachability (32) reduced the number of quantisation levels to about n = 16 at low 
SNRs. The developed filter (33) — (34) and smoother (35) — (36) were applied to 
the measurements. The RMS positioning errors exhibited by the developed filter 
and smoother are indicated by lines (iii) of Fig. 2(b) and Fig. 2(c), respectively. It 
can be seen that the developed filter and smoother can provide better than 0.1 m 
RMS error improvement over 12 to 20 dB SNR. 


Thus, this example demonstrates that mean-square-error performance benefits can 
be attained by the developed filter and smoother. The benefit arises because the 
state-space model (20) — (21) includes a transition probability matrix, a residual 
Gw,; (which is absent within the HMM approaches of [8] - [10]), and they are 
minimum-variance designs. 


“Knowledge is not a series of self-consistent theories that converges toward an ideal view; it is rather 
an ever increasing ocean of mutually incompatible (and perhaps even incommensurable) alternatives, 
each single theory, each fairy tale, each myth that is part of the collection forcing the others into greater 
articulation and all of them contributing, via this process of competition, to the development of our 
consciousness.” Paul Feyerabend 
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11.5 High-Order-Minimum-Variance HMM Filtering 
and Smoothing 


11.5.1 Overview 


In this section, the objective is to investigate filter formulations which may be able 
to provide further performance improvements in applications where some 
repetitions within measurement trajectories occur. A high-order signal model is 
proposed, in which the states comprise Kronecker products of Markov chains. 
Kronecker products are routinely employed for filtering multi-dimensional signals 
[47] and constructing large Markov models [48]. The Kronecker-product states 
may be plugged into the previously-described predictor, filter and smoother 
recursions. Although employing Kronecker tensor products results in a nonlinear 
combination of probability distributions, a signal model parameterisation is 
described below which enables optimal linear filters to be constructed. 


The inclusion of Markov chain states in an optimal filtering framework has been 
considered in the previous section. Here a signal model is proposed that possesses 
an additional level of abstraction or composition. In particular, an n-step Markov 
chain is described in which the states are Kronecker tensor products of n previous 
probability distributions and an m” x m” stochastic matrix. This approach 
endeavours to capture latent interdependencies between signals’ probability 
distributions in the pursuit of improved filter performance. 


Utilising Kronecker products of states from multiple previous time steps can 
provide improved prediction performance. Selecting the number of previous time 
steps remains an open problem. It is shown here that residual error variances can 
be used to compare candidate filter designs. 


The -step Markov chain and an output estimation problem are defined in Section 
11.5.2. An optimal filter that estimates an n-step Markov chain from noisy 
measurements is developed in Section 11.5.3. A disadvantage of the proposed 
approach is a significant increase in the number of calculations. It is advocated 
that m and n can be selected by searching for the filter that exhibits minimum 
residual error. Section 11.5.4 discusses a coal shiploader application. It is 
demonstrated that an n-step Markov chain, minimum-variance filter can 
outperform conventional Kalman and HMM filters for estimating the surfaces of 
coal being loaded into a ship’s hold. 


“In the beginning there was nothing, which exploded.” Sir Terrence David John Pratchett 
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11.5.2 Problem Definition 
11.5.2.1 Kronecker-Product Signal Model 


Consider a vector xx = [X,),%;95---»X;,,] © IR”, in which m is a positive integer. 


The Kronecker tensor product of integer n vectors x4 , Xi-1, ... Xen-1, for integer k > 
n + 2, is denoted by 


n 


X, Ox, 7 Ox, € R". (39) 


Operator precedence is omitted within (39) because the Kronecker tensor product 
is associative. It is convenient to denote the set of Kronecker products (39) over an 
interval of length N by 


—_ m'xN 
AV = [x, Ox, OX Ny OXy 17 Oxy, JE R , 


Similarly, let ue = [0 1,Uj95--5Uz] € R” and 


ror m"xN 
= [u, Ou, 7 Ou, yo Uy Quy, Ouy_, ,] E R : 


The inner product of 7 and Z/ is defined as 


N 
<44t0>= >» (u, Buy Buy, 4) (% @x, 7 @OxX_,4)- 


k=n+2 
The 2-norm of Z/ is defined as | || 2=< 44>. 


The Doob-Meyer Decomposition Theorem [11] states that a stochastic process 
may be decomposed into the sum of two parts, namely, a prediction and an input 


process. Therefore, a linear system is postulated which operates on an input V 
= [www] € R™*, w e€ R”™, and produces an output VY = 


[VV Vy] € R’ ye R,ie, V= PY It is assumed that 7 has states 
realised by 


— g(n) 
Xp @ Xp OX = AMA | OX g OX pg FW, (40) 


“Not only is the Universe stranger than we think, it is stranger than we can think.” Werner Karl 
Heisenberg 


352 Chapter 11 Hidden Markov Model Filtering and Smoothing 


where A” © R”*" is a transition probability matrix and w, is a time- 
homogenous stochastic input process with E{7} = OQ”. It is also assumed 
that x, ,@x,_,-:-@x,_, , are not observed directly but are hidden. The system 


output, vy. € R, is modelled as 


Vp = COX, OXpg °° OX pa» (41) 


where C’” € R””’. For example, consider a second system 7 of compatible 
dimensions that is driven by an input 2 and having an output 2/ = 7/Z From 
the properties of Kronecker tensor products [45] it can be shown that V @ Z/ = 
GWSeHZ =(FG8A) W® Z. That is, the eigenvalues of G ® 7 arise as a 


product of the eigenvalues of G and 7/[45]. Thus, a system possessing Kronecker 
product states will be asymptotically stable if it is a product of asymptotically 
stable systems. 


The adjoint of G denoted by 7", satisfies 

<L, GW>=<G'l4, W> 
for all 27 < IR”, where (.)" denotes the adjoint operator. Adjoint systems are 
required in the optimal filter and smoother derivations of [12]. 


11.5.2.2 High-Order Markov Chains 


Definition 1: Suppose there exists a probability distribution of a discrete-time 
vector X% = [X,1.X,o5Xeml € R” satisfying: (i) O< x; <1,i=1, ..., m; and 


(i) Nx, <1. 
i=l 


Two useful observations about Kronecker products of probability distribution 
vectors are stated below. 


Lemma 6 [46]: The elements of the vector x, ®x,_,:+*®X,_,., € R" satisfy the 
properties of a probability distribution vector. 


“Gravity explains the motions of the planets, but it cannot explain who sets the planets in motion.” Sir 
Isaac Newton 


G. A. Einicke, Smoothing, Filtering and Prediction: Estimating 353 
the Past, Present and Future (2 ed.), Prime Publishing, 2019 


Proof: (i) The non-negativity property follows by inspection. (ii) It is easily 
verified that 


m m m 
ee Ox; Ve OOM cy = Pe ae hee ee ep at =1. 
= 


Lemma 7: Let 


Lila" O0s 60 


00260": ise 


a” = Ee Re : 


0,0,---0 
C0500: Deed 


Then 


xe= Cx, Ox, Ox, 4- (42) 


Proof: The claim can be verified by evaluating (42) and using be = 


i=] 
Without loss of generality, suppose that x,,x,,,°°-,x,_,, Within (40) are 
orthogonal basis vectors in R” . Let us now consider A”. Post-multiplying (40) 


by (x,,@x%,5°*@x,,.,), taking the expectation over k e€ [1, N] and 
rearranging yields 


AO = EX(x, @ Xp 40 Xp V1 Opp OX 2) } 
-1 


x(E{r @ p20 OX pa) %H1 OH 2° @ prs) }) (43) 


It is shown below that the (i,/)"" element of A , denoted by a/"), is the probability 
of transitioning from state x, ,@x,_,°--@x,_, , =/ to state x, @x,,-@x_, = 
i. 


Lemma 8 [46]: The above A® is a column stochastic transition probability 
matrix. 


Proof: Expanding (43) yields 


"The mind likes a strange idea as little as the body likes a strange protein and resists it with similar 
energy. It would not perhaps be too fanciful to say that a new idea is the most quickly acting antigen 
known to science." Wilfred Trotter 
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A” = NE{(X, @ X12 @ Xp Hp @ X pg 1 @ Xp -2 y"} 


-1 
’ 


x(NE{(, @Xp9 7 OX 4-21 OX_g 1 @ tesa) }) 


in which 
N 
a” = praee (x, OX 11° OX) Hp Og 1 O%_po ); 
wae N 2 
ee (Xj, @X,_4 1 OX _p-2)} 
N 
_ Du (X, @ X42 @ Hyp 1) Hp @ Hn OA cas 
N 
ae (X,4 @X,_ “OX, 9-9 )} 
= Pr(x, Ox, Ox, =i] Xp OX. OX = (44) 
J). 


As a consequence of (44), the vector x, ®x,_,---@x,_,_, within (40) is termed an 
n-step Markov chain. For example, x, = A\x,_, and x, @x,_, = Ax, @x,_, 
are called 1-step and 2-step Markov chains, respectively. 


The representation (40), (41) defines a linear system in which 
X,_, @OX,_,***@x,_, , is the state vector. It is assumed herein that the x, within 


(40) are probability distribution vectors. From Lemma 8, the Kronecker products 
within (40), (41) are also probability distribution vectors. 


11.5.2.3 Output Estimation 


Suppose that signals are generated by the system (40), (41) and that observations 
Z = Vet Vy » (45) 


are available, where v; € R is a stationary measurement noise sequence with 
variance o-. It is desired to develop a causal solution 7 which operates on 
measurements Z = [z,,z,,-:-z,] and produces filtered estimates },,, of y,. The 


filter performance objective is to minimise the 2-norm of the causal part of the 
output estimation error, i.e., 


|. - 22}. []2, (46) 


“Millions saw the apple fall, Newton was the only one who asked why?” Bernard Baruch 
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where {.}+ denotes the causal part. 


11.5.3 Optimal Estimation of an n-Step Markov Chain 


11.5.3.1 Filter Realisation 


The optimal minimum-variance filter for the output estimation problem (40), (41), 
(45), (46) is given by 


= GIO” GG O G"+ a y"+, (47) 


as shown in [12]. Let X,,,, @X,4,.)°°° OX, 44 and X,,, OX, 44° OX ay 
denote the predicted and filtered estimates of x, @x,_,---@x,_,_, at time k, 
respectively. These estimates are calculated as 


Kerg ®@ Kean ere Ke nal = Kei ®@ Kp ni/tet ed Meentieet 
(48) 
+L(Z, SO is OF °C Fat —HZ,)) » (49) 


t ~ 1. OX = AMF ¢ OF 
Kp are @ Xp ip Opin =A” pip OX psig O Xen 


where L = P(C”)'Q™ is the filter gain, Q = C’P(C”)" + o?, in which P is 
the solution of the algebraic Riccati equation P = A P(A)" 
AMLC P(A”) + O™ . The filtered output estimates are given by 


Data = CO Fp PF ape @ Sgn an + HAZ} (50) 


The nonzero mean £{z,} can (for example) be obtained by calculating a moving 


average alongside the filter recursions. It is asserted that the above output estimate 
satisfies the previously-stated filtering performance objective. 


Lemma 9: The estimate (50) minimises || {V - HZ}+|| 2. 


The proof is omitted since it follows mutatis mutandis from [12]. The above signal 
model could be similarly applied within smoothers (such as [12]) to obtain further 
performance benefits. 


“All that was required to measure the planet was a man with a stick and a brain. In other words, couple 
an intellect with some experimental apparatus and almost anything seems achievable.” Simon Lehna 
Singh 
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11.5.3.2 Design Procedure 


It is assumed that initial estimates of x, may be calculated as x, = 
[Xx pos X, el = 9(z,), where g:R — R” denotes a function which discretises yx 


into m equally-spaced levels and produces estimates of x,¢ R”, i.e., 


1 ife,-A/2<z, <c,+A/2 (51) 
ee = 
E10 otherwise 
where 
Cy =lexe;..¢,;e R™ (52) 


is a row vector of discretisation level centers that are spaced A apart. Estimates of 
the Kronecker tensor product states may then be calculated as 
X, OX, Oxy. 


A least-squares estimate of the transition probability matrix for n discretisation 
levels is given by [33]: 


Ams E{(x, @x, °° @ eee (Xp. @ Xp_9 11 @Xj_p_-r)$ 
(53) 
XE{(X,-9 @X p91 @ Xp y-2 ie (X,_7 OX 2° @ Sea . 


It is tacitly assumed that the G within the optimum solution (47) is causal or 
stable. However, the transition probability matrix (53) has one maximum 
eigenvalue of 1, which results in G being marginally unstable. This suggests that 


the filter is at risk of being marginally unstable. A method for removing the 
potential instability is described in [23] and Lemma 1. 


We seek to extract yz from the Kronecker tensor product states 
xX, Ox, ,°:'@x,,,. Since the elements of x, pertain to mutually exclusive 


transitions, the outputs are calculated as the expected payoff, i.e., yx = Coxx. where 
C2 is defined in (52). Thus, the output mapping can be calculated as 


Cc” = Co” : (54) 


“The fact that we live at the bottom of a deep gravity well, on the surface of a gas covered planet going 
around a nuclear fireball 90 million miles away and think this to be normal is obviously some 
indication of how skewed our perspective tends to be.” Douglas Noel Adams 
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where C\") is defined within Lemma 7. The above mapping affords a linear 


estimation problem which dispenses with the need of an extended Kalman filter 
for recovering y, from the nonlinear combination of probability distributions 
within (2). 


Since the assumed system (41), (42) is time-invariant, an input sequence 
covariance can be obtained from the — steady-state — estimate: 


Qh = (I-A) EC! x, Ox, OX.) 
x(CTx, @x,_,7@x,_,.,) }U-A”)’, where (.)' denotes the Moore-Penrose 
pseudo-inverse. 


Guidance is desired for selecting the number of discretisations m and the number 
tensor products n. To this end, a method is suggested below for comparing 


candidate filter designs. Let A® and C denote available estimates of the actual 
A” and C, respectively. 


Lemma 10 [46]: Let Xie @ Xp OX on ve = 

XOX ON Kept OX nel Xie and 
~ ~ ~ ~ ~ ~ T 

P= EXK ij OX paa OX nana) Kea Xana Xena) $ 

respectively denote the state prediction error and its stationary covariance. If 


A™ = A™ and C™ = C™ the residual error variance 


Eo Ais = CM PRC): +R 
(55) 
is minimised. 


Proof: Subtracting (49) from (40) and rearranging yields 
Kei ® Kat a Ken Akl 


= (n) (n)\~ ~ as 
= (A — KO VX @ Xp 201 @ Xena +KY, + w, 


+(Ae = A” — (CR = CY WX pana @ Xp aia @ Xpn-21ke-1« 
Constructing the state prediction error covariance gives 
P=(A” —~KC”)P(A” ~KC”)' + Ko K' +0 

+(A” — Am =K(c _ c”)) P, 


x(A” 2 AM =Kce™ 2 & Cc yy 
+(A ~KC)P, (A” a ~K(c™ = cm) 


“Its easy to come up with new ideas; the hard part is letting go of what worked for you two years ago, 
but will soon be out of date.” Roger Von Oech 
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+(A AM SKC” i c™)) py (A AKO ; 
(56) 


where P2 = EX( X41 @ Xana? @ Xn Gna B Mana ® Moe dee) and 
P3 = BY (X pip @ Xp OX pare Kea @ Xana Olay 1 The claim 
follows by inspection of (56). 


The filter error covariance is affine to the prediction error covariance (see [1]). 
Thus, the conditions of Lemma 10 also minimise the filter error. Suppose that 
signal models with different m and n have compatibly-dimensioned state space 
matrices which are padded with zeroes. Then Lemma 10 suggests that candidate 
filter designs could be assessed by comparing their residual error variances. The 
advantage of comparing residual error covariances is demonstrated within the 
example that follows. 


11.5.4 Application 


Assisting shiploading requires identification of the cargo hold of a ship and 
continuous estimation of the volume of coal residing therein. The application of a 
LIDAR for estimating the volume of loaded material is described in [50] — [54]. 


A SICK LD-LRS3611 LIDAR, was mounted on an EXLAR Tritex II rotator that 
was installed above the chute of an operational coal shiploader. The LIDAR was 
configured to scan a 360° field of view at 0.25° increments (i.e., 1440 scans) and a 
10 Hz rotation rate within the vertical plane. The EXLAR Tritex II rotator angles 
were varied from —35° to +260° at 0.83° increments (i.e., 356 angles) in the 
horizontal plane. The 356x1440 measurement records was stored in a MySQL 
database over 40-hour-long shiploading events and monitored remotely. An 
accumulated LIDAR point cloud model of a ship’s hold during loading is shown 
in Fig. 3. A heap of coal can easily be identified in the centre of the figure. 


Coal dust spills out of the chute and accumulates on the LIDAR lens which 
degrades the measurements. To demonstrate the efficacy of the described filter, 
independent Gaussian noise realisations were added to the LIDAR measurements. 
For an n-step, m-discretisation-level Markov model, an m”xm" transition 
probability matrix needs to be identified from the data. The performance of the 
developed filter depends on the Markov model assumptions and the prevailing 
signal-to-noise ratio (SNR). The candidate assumptions considered here include 1- 
step, 2-step and 3-Step Markov chains together with m = 16 discretisations. The 
above design procedures were used to implement filters that operate on the 
individual 356 vertical-plane measurements. 


“Tf the bee disappeared off the face of the earth, man would only have four years left to live.” Maurice 
Maeterlinck, The Life of the Bee 
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The 2-step Markov chain filter was found to provide the best residual error 
variance performance (see Lemma 5) over -10 to 10 dB SNR. It was observed that 
the transition probability matrix is approximately tri-diagonal, which suggests that 
about 3/2 x16? transition probability matrix elements were estimated. It is 
known that the Cramer-Rao lower bound for estimating a parameter of a signal in 
white Gaussian noise is proportional to the noise variance [49]. Thus, signal 
complexity, SNR and calculation cost need to be considered in the design of {m, 


For comparison, a first-order Kalman filter was designed by assuming a model of 
the form 


Xp = Fx, +, 5 (57) 
Z,=X,+%, (58) 


in which F, o% € R were estimated from the measurements. The RMSE 


observed for the first-order Kalman filter versus SNR is indicated by the line (i) of 
Fig. 4. It can be seen that the minimum-error-residual 2-step Markov chain filter 
(line (ii)) performs better than the first order filter. 
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Fig. 3. Sample snapshot of a LIDAR point cloud ‘Fig. 4. Filter performance comparison: (i) first- 

of a ship’s hold showing heaped coal. order Kalman filter; (ii) minimum-error-residual 
2-step Markov chain filter; (iii) HMM filter with 
a estimated from z, ; (iv) HMM filter with a 
estimated from y;. 


In an HMM approach, the signal amplitude range are discretised into m levels 
centered around points qg;, i = 1 .. m. An HMM filter and a fixed-interval HMM 
smoother were implemented using the techniques described in Section 11.3, 
namely, using (6) - (9), assuming Gaussian probability distributions (15) — (16). 
The performance exhibited a “blind” HMM filter, in which m = 16 and F was 
estimated from the measurements z,, is indicated by line (iii) of Fig. 4. Better 


“Scientific knowledge is in perpetual evolution; it finds itself changed from one day to the next.” Jean 
Piaget 
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HMM filter performance was obtained by estimating a from the noiseless LIDAR 
signal y,, as is indicated by line (iv) of Fig. 4. 


It can be seen that the minimum-error-residual 2-step Markov chain filter 
outperforms the Kalman filter and the two HMM filters. Note that a conventional 
Kalman filter minimises the mean square error (MSE) but it does not exploit 
transition probabilities. Although a conventional HMM filter possesses a transition 
probability matrix, it does not explicitly minimise the MSE. The developed n-step 
Markov chain filter provides a performance benefit because it exploits transition 
probability information and is a minimum-MSE design. It also employs Kronecker 
tensor product states which allow interdependencies between signals’ probability 
distributions to be captured. However, there is a significant increase in calculation 
cost, namely, the filters require (2m” + 1)N inner products of m”-dimensional 
quantities. 


Note that an n-step Markov chain can similarly be used within fixed-lag and fixed- 
interval minimum-variance, HMM and minimum-variance-HMM smoothers. This 
is expected to provide incremental performance gains at the expense of further 
calculations. 


11.6 Chapter Summary 


This chapter shows how observed sequences of patterns may be exploited to 
improve signal recovery performance. To this end, HMM, minimum-variance 
HMM, and high-order-minimum-variance-HMM filter/smoother derivations are 
set out. These estimators are summarised in Table 1. 


It is illustrated with the aid of examples that HMM estimators can outperform the 
optimal minimum-variance solutions at low-SNR, provided that noiseless 
“training data” is available for parameter estimation. The examples also 
demonstrate that discretisation (or quantisation) degrades signal fidelity and 
performance at high SNR. 


The minimum-variance-HMM estimators arise by employing _ transition 
probability matrices and probability distribution states within the standard 
minimum-variance filter/smoother recursions. MSE benefits can arise because the 
estimators assume the presence of a process noise and are minimum-error- 
variance designs. However, the model order (or the number of 
discretisation/quantisation levels) needs to be optimised by the designer. A low- 
order model can compromise signal fidelity. If the model order is too high, the 
transition probability matrix can become rank deficient. Consequently, deflating 
the model to enforce observability and reachability is advocated. 


“T like thinking big. If you're going to be thinking anything, you might as well think big.” Donald John 
Trump 
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Transition probability matrix 
A= {ay} € RM 
G5 = PLX 4 = 45 |X, = 955 


HMM 
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=C (k 
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N 
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Smoothed probability distributions 
AGLAG) 
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i=argmax (i), Xyp =4;- 
Isis 
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Signal model 
Xp = Fx, + Guy, 
Y, = H,x, + Jw, 


a=F 0 


Ay=[hy,.. An] 


Minimum-variance HMM 


Filter recursions 
San = Xena + PHO"'(Y, —FE,X,44—H,7) , 
Sia = Fu 
Smoother recursions 
O, = OP (Y, — A X4n1- 2) 
Sea = (Fy Gs HIK')é&, + HOP a, > 
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he = -K'é, rts mate 
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X, OX OX 4 


2 = Ve + Vy 
Liat 0,030 
wy [050,20 Lhe 
Crs 
0,0,---0 


Cy = [6,5€75-5 En] 


(n) = (n) 
C GC 


High-order-minimum-variance HMM 


— 4”) 
= AX OX 9° ON 9 + Wes 


-c—™ 
Vp = OMX 4 ON g* OX 2 


Similar to above application of the minimum- 
variance filter and smoother recursions to the 
signal model 


Table 1. Summary of HMM, minimum-variance HMM and high-order-minimum-variance HMM 


filter/smoother recursions. 


“The whole [scientific] process resembles biological evolution. A problem is like an ecological niche, 
and a theory is like a gene or a species which is being tested for viability in that niche.” David Deutsch 
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Optimal prediction conventionally involves operating on the state vector from one 
previous time step. Although it may be appealing to contemplate higher- 
dimensional matrix algebra to generalise predictions from multiple previous time 
steps, formulating the associated Riccati equations would become unwieldly. A 
more elegant and tractable approach is to simply work with Kronecker tensor 
products of previous states. Indeed, this is done in the development of the so- 
called high-order-HMM filter/smoother. This approach allows more transition 
probability interdependencies to be captured and exploited. 


If the applications at hand are insensitive to calculation cost and suboptimal 
estimation performance is acceptable then experimenting with arbitrary parameter 
values within the minimum-variance solutions may suffice. It may be possible to 
reduce the MSE by instead employing the well-known least-squares (or better, 
unbiased-least-squares) parameter estimates. Conversely, if performance is 
paramount and higher calculation overheads can be tolerated then including 
transition probability matrices and Kronecker products of probability distribution 
states within minimum-variance recursions (as described above) can be 
considered. 


11.7 Problems 


Problem 1. Calculate the transition probability matrices for Markov processes 
having the following sequences. 

(a) x1 2, X2 2, X3 1, x4 1, x5 1, x6 2, X7 2, X8 1, x9 1 > X10 1. 

(b) x1 1, x2 2, X3 35 X4 1, x5 2, X6 3, XT 1, xs 2, X9 BN 

(c) x1 1, x2 2, X3 3, x4 2, X5 1, x6 2, X7 3, X8 2, xo=l. 


Problem 2. Suppose that a Markov process x; takes on index values in the set 
04 03 03 

{1,2,3} and its transition probability matrix is A=|0.2 0.6 0.2]. For an initial 
0.1 0.1 0.8 


state x\=3, calculate the probability that x.=3, x3=3, x4=1, xs=1, x6=3, x7=2 and 
xg=l. 


Problem 3. Consider the transition matrix A= , 


l-r 
I where 0 <r< 1. Prove 
¥ 


by induction that the combined transition matrix after m_ steps is 
_[0.5+0.5(2r-1I)" 0.5-0.5(2r-1)" 
~}0.5-0.5(2r-1)" 0.5+0.5(2r 1)” | 


m 


“T think therefore I am.” René Descartes“ 
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Problem 4. Consider the transition matrix a-| 
7. 


aa |: where 0 <r, s < 1. Find 
=) 


the associated stationary probability distribution z that satisfies a = Az. 


Problem 5. Derive expressions for (a) a@,(i) = Pr{x, =i,y,,...y,}3 (b) 3@ = 


Preys 


Yr |X, =i} and (c) 7,() = DO Pr{x, =i] yr} - 


j=l 


Problem 6. (a) Derive an expression for the one-step-ahead prediction error 


covariance in terms of an estimate A of A and anestimate C of C. 


(b) Show that the prediction error covariance is minimised at 4= A and C= C. 


11.8 Glossary 
1, Probability distribution vector (state) at time k, which 
satisfies z,=Az,,, where A is a transition probability 
matrix. 
1 Stationary probability distribution vector state at time k, 
which satisfies z= Az. 
a,(i) i” component of the (forward) probability distribution 
vector at time k. 
C,(i) i” component of the conditional observation likelihood at 
time k. 
B,(i) i” component of the backward probability distribution 
vector at time k. 
y,(i) i” component of the smoothed probability distribution 
vector at time k. 
O(F.,H,) Observability matrix, where F’, and H, respectively denote 
n-order probability transition matrix and output mapping. 
OF "HY Reachability matrix. 
X, @x,4 Kronecker tensor product of x, and x,_,. 
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